StyleShot: 任意风格的快照

高俊尧, 刘彦辰, 孙亚楠<sup>‡</sup>, 唐寅豪, 曾艳红, 陈凯*, 赵彩荣* <br><br> (* 通讯作者, <sup>‡</sup> 项目负责人)

来自同济大学和上海人工智能实验室。

</div>

摘要

在本文中，我们展示了一个好的风格表示对于无需测试时调整的广义风格迁移至关重要且足够。我们通过构建一个风格感知编码器和一个组织良好的风格数据集StyleGallery来实现这一目标。通过专门设计用于风格学习，这个风格感知编码器经过解耦训练策略的训练，可以提取富有表现力的风格表示，而StyleGallery则赋予了泛化能力。我们还采用了一个内容融合编码器来增强图像驱动的风格迁移。我们强调，我们的方法StyleShot简单而有效，无需测试时调整即可模仿各种所需风格，如3D、平面、抽象甚至细粒度风格。严格的实验验证表明，与现有最先进的方法相比，StyleShot在广泛的风格范围内实现了卓越的性能。

架构图

新闻

[2024/7/5] 🔥 我们在HuggingFace上发布了在线演示。
[2024/7/3] 🔥 我们发布了StyleShot_lineart，这是一个以内容图像的线稿作为控制的版本。
[2024/7/2] 🔥 我们发布了论文。
[2024/7/1] 🔥 我们发布了代码、检查点、项目页面和在线演示。

开始使用

# 安装styleshot
git clone https://github.com/Jeoyal/StyleShot.git
cd StyleShot

# 创建conda环境
conda create -n styleshot python==3.8
conda activate styleshot
pip install -r requirements.txt

# 下载模型
git lfs install
git clone https://huggingface.co/Gaojunyao/StyleShot
git clone https://huggingface.co/Gaojunyao/StyleShot_lineart

模型

你可以从这里下载我们的预训练权重。要运行演示，你还需要下载以下模型：

推理

对于推理，你应该下载预训练权重并准备自己的参考风格图像或内容图像。

# 运行文本驱动的风格迁移演示
python styleshot_text_driven_demo.py --style "{风格图像路径}" --prompt "{提示词}" --output "{保存路径}"

# 运行图像驱动的风格迁移演示
python styleshot_image_driven_demo.py --style "{风格图像路径}"  --content "{内容图像路径}" --preprocessor "Contour" --prompt "{提示词}" --output "{保存路径}"

# 将styleshot与controlnet和t2i-adapter集成
python styleshot_t2i-adapter_demo.py --style "{风格图像路径}"  --condition "{条件图像路径}" --prompt "{提示词}" --output "{保存路径}"
python styleshot_controlnet_demo.py --style "{风格图像路径}"  --condition "{条件图像路径}" --prompt "{提示词}" --output "{保存路径}"

styleshot_text_driven_demo：基于参考风格图像和文本提示的文本驱动风格迁移。

<div align="center"> <img src=assets/text_driven.png> <p>文本驱动风格迁移可视化</p> </div>

styleshot_image_driven_demo：基于参考风格图像和内容图像的图像驱动风格迁移。

<div align="center"> <img src=assets/image_driven.png> <p>图像风格迁移可视化</p> </div>

styleshot_controlnet_demo，styleshot_t2i-adapter_demo：与controlnet和t2i-adapter的集成。

训练

我们采用两阶段训练策略来训练我们的StyleShot，以更好地融合内容和风格。对于训练数据，您可以使用我们的训练数据集StyleGallery或将自己的数据集制作成json文件。

# 训练阶段1，仅训练风格组件。
accelerate launch --num_processes 8 --multi_gpu --mixed_precision "fp16" \
  tutorial_train_styleshot_stage_1.py \
  --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5/" \
  --image_encoder_path="{图像编码器路径}" \
  --image_json_file="{data.json}" \
  --image_root_path="{图像路径}" \
  --mixed_precision="fp16" \
  --resolution=512 \
  --train_batch_size=16 \
  --dataloader_num_workers=4 \
  --learning_rate=1e-04 \
  --weight_decay=0.01 \
  --output_dir="{输出目录}" \
  --save_steps=10000

# 训练阶段2，仅训练内容组件。
accelerate launch --num_processes 8 --multi_gpu --mixed_precision "fp16" \
  tutorial_train_styleshot_stage_2.py \
  --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5/" \
  --pretrained_ip_adapter_path="./pretrained_weight/ip.bin" \
  --pretrained_style_encoder_path="./pretrained_weight/style_aware_encoder.bin" \
  --image_encoder_path="{图像编码器路径}" \
  --image_json_file="{data.json}" \
  --image_root_path="{图像路径}" \
  --mixed_precision="fp16" \
  --resolution=512 \
  --train_batch_size=16 \
  --dataloader_num_workers=4 \
  --learning_rate=1e-04 \
  --weight_decay=0.01 \
  --output_dir="{输出目录}" \
  --save_steps=10000

StyleGallery<a name="style_gallery"></a>

我们精心策划了一个风格平衡的数据集，称为StyleGallery，其中包含从公开可用数据集中提取的广泛多样的图像风格，用于训练我们的StyleShot。要准备我们的数据集StyleGallery，请参考教程，或从这里下载json文件。

StyleBench

为解决基于参考的风格化生成缺乏基准的问题，我们建立了一个<a href='https://drive.google.com/file/d/1I-Zv5blsrJsckXrvcP_f8TJ4gy6xrwCA/view?usp=drive_link'>风格评估基准</a>，包含490个参考图像中的73种不同风格。

免责声明

我们开发此仓库用于研究目的，因此它只能用于个人/研究/非商业用途。

引用

如果您发现StyleShot对您的研究和应用有用，请使用以下BibTeX进行引用：

@article{gao2024styleshot,
  title={StyleShot: A Snapshot on Any Style},
  author={Junyao, Gao and Yanchen, Liu and Yanan, Sun and Yinhao, Tang and Yanhong, Zeng and Kai, Chen and Cairong, Zhao},
  booktitle={arXiv preprint arxiv:2407.01414},
  year={2024}
}