AnimateDiff

这个仓库是AnimateDiff [ICLR2024 Spotlight]的官方实现。它是一个即插即用的模块，可以将大多数社区文本到图像模型转变为动画生成器，无需额外训练。

AnimateDiff：无需特定调整即可为个性化文本到图像扩散模型添加动画效果 </br> 郭宇伟、杨策元✝、饶安逸、梁正阳、王耀辉、乔宇、 Maneesh Agrawala、林达华、戴博 (✝通讯作者)

注意： main分支适用于Stable Diffusion V1.5；对于Stable Diffusion XL，请参考sdxl-beta分支。

快速演示

更多结果可以在画廊中找到。其中一些是由社区贡献的。

<table class="center"> <tr> <td><img src="https://yellow-cdn.veclightyear.com/835a84d5/7b0a76cb-47fb-4925-99d8-592788e6ac2e.gif"></td> <td><img src="https://yellow-cdn.veclightyear.com/835a84d5/d61c0353-1ca1-43cc-a534-fdbdeb29a12d.gif"></td> <td><img src="https://yellow-cdn.veclightyear.com/835a84d5/afd080be-8e1b-4c90-8621-dc3e4162b003.gif"></td> <td><img src="https://yellow-cdn.veclightyear.com/835a84d5/6bf75623-0aec-4199-acbe-70c40c78cbab.gif"></td> </tr> </table> <p style="margin-left: 2em; margin-top: -1em">模型：<a href="https://civitai.com/models/30240/toonyou">ToonYou</a></p> <table> <tr> <td><img src="https://yellow-cdn.veclightyear.com/835a84d5/c99a19ac-a959-4cc6-be89-81c571535252.gif"></td> <td><img src="https://yellow-cdn.veclightyear.com/835a84d5/a8b0fccc-6c6b-4e5f-8b01-e8e836f41fec.gif"></td> <td><img src="https://yellow-cdn.veclightyear.com/835a84d5/00e4695e-f5a3-445b-92b4-71f830589e62.gif"></td> <td><img src="https://yellow-cdn.veclightyear.com/835a84d5/89564a94-316d-4a84-b2f6-1922f39ed0b6.gif"></td> </tr> </table> <p style="margin-left: 2em; margin-top: -1em">模型：<a href="https://civitai.com/models/4201/realistic-vision-v20">Realistic Vision V2.0</a></p>

快速开始

注意： AnimateDiff 也得到了 Diffusers 的官方支持。访问 AnimateDiff Diffusers 教程获取更多详情。 以下说明适用于使用本仓库。

注意： 对于所有脚本，检查点的下载将自动处理，因此首次执行时脚本运行可能需要更长时间。

1. 设置仓库和环境

git clone https://github.com/guoyww/AnimateDiff.git
cd AnimateDiff

pip install -r requirements.txt

2. 启动采样脚本！

生成的样本可以在 samples/ 文件夹中找到。

2.1 使用社区模型生成动画

python -m scripts.animate --config configs/prompts/1_animate/1_1_animate_RealisticVision.yaml
python -m scripts.animate --config configs/prompts/1_animate/1_2_animate_FilmVelvia.yaml
python -m scripts.animate --config configs/prompts/1_animate/1_3_animate_ToonYou.yaml
python -m scripts.animate --config configs/prompts/1_animate/1_4_animate_MajicMix.yaml
python -m scripts.animate --config configs/prompts/1_animate/1_5_animate_RcnzCartoon.yaml
python -m scripts.animate --config configs/prompts/1_animate/1_6_animate_Lyriel.yaml
python -m scripts.animate --config configs/prompts/1_animate/1_7_animate_Tusun.yaml

2.2 使用 MotionLoRA 控制生成动画

python -m scripts.animate --config configs/prompts/2_motionlora/2_motionlora_RealisticVision.yaml

2.3 使用 SparseCtrl RGB 和草图进行更多控制

python -m scripts.animate --config configs/prompts/3_sparsectrl/3_1_sparsectrl_i2v.yaml
python -m scripts.animate --config configs/prompts/3_sparsectrl/3_2_sparsectrl_rgb_RealisticVision.yaml
python -m scripts.animate --config configs/prompts/3_sparsectrl/3_3_sparsectrl_sketch_RealisticVision.yaml

2.4 Gradio 应用

我们创建了一个 Gradio 演示以使 AnimateDiff 更易于使用。默认情况下，演示将在 localhost:7860 运行。

python -u app.py

技术说明

AnimateDiff

AnimateDiff 旨在学习可转移的运动先验，以应用于 Stable Diffusion 系列的其他变体。 为此，我们设计了以下由三个阶段组成的训练流程。

在1. 缓解负面影响阶段，我们训练领域适配器，例如 v3_sd15_adapter.ckpt，以适应训练数据集中的有缺陷的视觉伪影（如水印）。这也有利于运动和空间外观的解耦学习。默认情况下，适配器可在推理时移除。它也可以集成到模型中，其效果可通过 lora 缩放器调整。
在2. 学习运动先验阶段，我们训练运动模块，例如 v3_sd15_mm.ckpt，以从视频中学习真实世界的运动模式。
在3.（可选）适应新模式阶段，我们训练MotionLoRA，例如 v2_lora_ZoomIn.ckpt，以高效地使运动模块适应特定的运动模式（相机缩放、旋转等）。

SparseCtrl

SparseCtrl 旨在通过采用一些稀疏输入（如少量 RGB 图像或草图输入）为文本到视频模型增加更多控制。 其技术细节可在以下论文中找到：

SparseCtrl: 为文本到视频扩散模型添加稀疏控制
郭宇伟，杨策元✝，饶安逸， Maneesh Agrawala，林达华，戴博（✝通讯作者）

模型版本

AnimateDiff v3 和 SparseCtrl（2023.12）

在这个版本中，我们使用领域适配器 LoRA进行图像模型微调，这在推理时提供了更多灵活性。我们还实现了两个（RGB 图像/涂鸦）SparseCtrl 编码器，可以接受任意数量的条件图来控制动画内容。

局限性

可以注意到轻微的闪烁；
为了保持与社区模型的兼容性，没有针对通用 T2V 进行特定优化，导致在这种设置下视觉质量有限；
（风格对齐）对于图像动画/插值等用途，建议使用由同一社区模型生成的图像。

演示

<table class="center"> <tr style="line-height: 0"> <td width=25% style="border: none; text-align: center">输入（由 RealisticVision 生成）</td> <td width=25% style="border: none; text-align: center">动画</td> <td width=25% style="border: none; text-align: center">输入</td> <td width=25% style="border: none; text-align: center">动画</td> </tr> <tr> <td width=25% style="border: none"><img src="https://yellow-cdn.veclightyear.com/835a84d5/7b6fdae9-44b5-42de-852c-2407e404ab90.png" style="width:100%"></td> <td width=25% style="border: none"><img src="https://yellow-cdn.veclightyear.com/835a84d5/a59571d5-eaa5-4e5d-86be-1093b1257509.gif" style="width:100%"></td> <td width=25% style="border: none"><img src="https://yellow-cdn.veclightyear.com/835a84d5/9a7bd7c7-2209-44a6-893f-d7fd681cf148.png" style="width:100%"></td> <td width=25% style="border: none"><img src="https://yellow-cdn.veclightyear.com/835a84d5/bfd00d0f-c5b6-4fd4-863b-56e042883b52.gif" style="width:100%"></td> </tr> </table> <table class="center"> <tr style="line-height: 0"> <td width=25% style="border: none; text-align: center">输入涂鸦</td> <td width=25% style="border: none; text-align: center">输出</td> <td width=25% style="border: none; text-align: center">输入涂鸦</td> <td width=25% style="border: none; text-align: center">输出</td> </tr> <tr> <td width=25% style="border: none"><img src="https://yellow-cdn.veclightyear.com/835a84d5/4e1fe8b5-ed1d-4c6f-8cf8-b47882f1f1d9.png" style="width:100%"></td> <td width=25% style="border: none"><img src="https://yellow-cdn.veclightyear.com/835a84d5/5c7df530-b97e-48b3-96fd-2d651d2db968.gif" style="width:100%"></td> <td width=25% style="border: none"><img src="https://yellow-cdn.veclightyear.com/835a84d5/c77d8ee9-1b4f-4c4e-964d-928a54c7c0db.png" style="width:100%"></td> <td width=25% style="border: none"><img src="https://yellow-cdn.veclightyear.com/835a84d5/8defbcfa-0d15-41a4-8369-521cc1591bc5.gif" style="width:100%"></td> </tr> </table>

AnimateDiff SDXL-Beta (2023.11)

发布了SDXL上的运动模块（测试版），可在Google Drive、HuggingFace和CivitAI获取。可以生成高分辨率视频（即1024x1024x16帧，具有不同的宽高比），无论是否使用个性化模型。推理通常需要约13GB显存和调整后的超参数（如采样步数），具体取决于所选的个性化模型。有关推理的更多详细信息，请查看sdxl分支。

<details close> <summary>AnimateDiff SDXL-Beta 模型库</summary>

名称	HuggingFace	类型	存储空间
`mm_sdxl_v10_beta.ckpt`	链接	运动模块	950 MB

</details>

演示

<table class="center"> <tr style="line-height: 0"> <td width=52% style="border: none; text-align: center">原始SDXL</td> <td width=30% style="border: none; text-align: center">社区SDXL</td> <td width=18% style="border: none; text-align: center">社区SDXL</td> </tr> <tr> <td width=52% style="border: none"><img src="https://yellow-cdn.veclightyear.com/835a84d5/f00889c2-efb1-49a0-842e-37fdb387b406.gif" style="width:100%"></td> <td width=30% style="border: none"><img src="https://yellow-cdn.veclightyear.com/835a84d5/40484647-f40b-4244-9785-bd3d4b9bb106.gif" style="width:100%"></td> <td width=18% style="border: none"><img src="https://yellow-cdn.veclightyear.com/835a84d5/65f2ab9c-beed-45ea-9005-8f5d2fbec130.gif" style="width:100%"></td> </tr> </table>

AnimateDiff v2 (2023.09)

在这个版本中，运动模块mm_sd_v15_v2.ckpt（Google Drive、HuggingFace、CivitAI）是在更大的分辨率和批次大小上训练的。我们发现，规模化训练显著提高了运动质量和多样性。我们还支持八种基本相机动作的MotionLoRA。每个MotionLoRA检查点仅占用77 MB存储空间，可在Google Drive、HuggingFace和CivitAI获取。

<details close> <summary>AnimateDiff v2 模型库</summary> | 名称 | HuggingFace | 类型 | 参数 | 存储 | | - | - | - | - | - | | `mm_sd_v15_v2.ckpt` | [链接](https://huggingface.co/guoyww/animatediff/blob/main/mm_sd_v15_v2.ckpt) | 运动模块 | 453 M | 1.7 GB | | `v2_lora_ZoomIn.ckpt` | [链接](https://huggingface.co/guoyww/animatediff/blob/main/v2_lora_ZoomIn.ckpt) | MotionLoRA | 19 M | 74 MB | | `v2_lora_ZoomOut.ckpt` | [链接](https://huggingface.co/guoyww/animatediff/blob/main/v2_lora_ZoomOut.ckpt) | MotionLoRA | 19 M | 74 MB | | `v2_lora_PanLeft.ckpt` | [链接](https://huggingface.co/guoyww/animatediff/blob/main/v2_lora_PanLeft.ckpt) | MotionLoRA | 19 M | 74 MB | | `v2_lora_PanRight.ckpt` | [链接](https://huggingface.co/guoyww/animatediff/blob/main/v2_lora_PanRight.ckpt) | MotionLoRA | 19 M | 74 MB | | `v2_lora_TiltUp.ckpt` | [链接](https://huggingface.co/guoyww/animatediff/blob/main/v2_lora_TiltUp.ckpt) | MotionLoRA | 19 M | 74 MB | | `v2_lora_TiltDown.ckpt` | [链接](https://huggingface.co/guoyww/animatediff/blob/main/v2_lora_TiltDown.ckpt) | MotionLoRA | 19 M | 74 MB | | `v2_lora_RollingClockwise.ckpt` | [链接](https://huggingface.co/guoyww/animatediff/blob/main/v2_lora_RollingClockwise.ckpt) | MotionLoRA | 19 M | 74 MB | | `v2_lora_RollingAnticlockwise.ckpt` | [链接](https://huggingface.co/guoyww/animatediff/blob/main/v2_lora_RollingAnticlockwise.ckpt) | MotionLoRA | 19 M | 74 MB |

演示（MotionLoRA）

<table class="center"> <tr style="line-height: 0"> <td colspan="2" style="border: none; text-align: center">放大</td> <td colspan="2" style="border: none; text-align: center">缩小</td> <td colspan="2" style="border: none; text-align: center">向左平移</td> <td colspan="2" style="border: none; text-align: center">向右平移</td> </tr> <tr> <td style="border: none"><img src="https://yellow-cdn.veclightyear.com/835a84d5/443b953a-f670-4935-945a-d63d64119241.gif"></td> <td style="border: none"><img src="https://yellow-cdn.veclightyear.com/835a84d5/cf6d139c-9772-4c31-808c-cbd7454d7df9.gif"></td> <td style="border: none"><img src="https://yellow-cdn.veclightyear.com/835a84d5/f493b3d0-fbc1-48c5-bb28-96f3bf3a3aa2.gif"></td> <td style="border: none"><img src="https://yellow-cdn.veclightyear.com/835a84d5/3ed2dc06-0dd1-42ce-843a-9be7ce74d962.gif"></td> <td style="border: none"><img src="https://yellow-cdn.veclightyear.com/835a84d5/c58e8dd2-7ffc-4c54-9aed-5dc0b674c1f3.gif"></td> <td style="border: none"><img src="https://yellow-cdn.veclightyear.com/835a84d5/8b97d621-0fc7-4901-8b94-c90179d1d78c.gif"></td> <td style="border: none"><img src="https://yellow-cdn.veclightyear.com/835a84d5/b94ed533-f980-46aa-958e-edae209ba857.gif"></td> <td style="border: none"><img src="https://yellow-cdn.veclightyear.com/835a84d5/2b844762-4c6a-424d-a278-18a04e1d53c5.gif"></td> </tr> <tr style="line-height: 0"> <td colspan="2" style="border: none; text-align: center">向上倾斜</td> <td colspan="2" style="border: none; text-align: center">向下倾斜</td> <td colspan="2" style="border: none; text-align: center">逆时针旋转</td> <td colspan="2" style="border: none; text-align: center">顺时针旋转</td> </tr> <tr> <td style="border: none"><img src="https://yellow-cdn.veclightyear.com/835a84d5/bb8a60d7-6959-45cb-b61c-223400fbe737.gif"></td> <td style="border: none"><img src="https://yellow-cdn.veclightyear.com/835a84d5/74b58a88-e18f-458d-8a52-745c3b768d43.gif"></td> <td style="border: none"><img src="https://yellow-cdn.veclightyear.com/835a84d5/54e71584-05db-48d1-9fcd-e835b8502f54.gif"></td> <td style="border: none"><img src="https://yellow-cdn.veclightyear.com/835a84d5/08d86ef5-4813-4968-87d6-1d5ad4709c70.gif"></td> <td style="border: none"><img src="https://yellow-cdn.veclightyear.com/835a84d5/aad3a51c-34a2-48e5-8add-9a50ff8a1ddb.gif"></td> <td style="border: none"><img src="https://yellow-cdn.veclightyear.com/835a84d5/b161418d-240c-408c-8cf7-3cbdfab8ee8a.gif"></td> <td style="border: none"><img src="https://yellow-cdn.veclightyear.com/835a84d5/20b486d3-81dd-4e44-b6f4-4bdb1b84eaf5.gif"></td> <td style="border: none"><img src="https://yellow-cdn.veclightyear.com/835a84d5/5ab53db7-bf68-4bd1-9a3a-0ff43c100cd5.gif"></td> </tr> </table>

演示（改进的运动）

以下是mm_sd_v15.ckpt（左）和改进的mm_sd_v15_v2.ckpt（右）之间的对比。

AnimateDiff v1（2023.07）

AnimateDiff的第一个版本！

<details close> <summary>AnimateDiff v1 模型库</summary>

名称	HuggingFace	参数	存储空间
mm_sd_v14.ckpt	链接	417 M	1.6 GB
mm_sd_v15.ckpt	链接	417 M	1.6 GB

</details> </details>

训练

详细信息请查看训练步骤。

免责声明

本项目仅供学术用途。我们对用户生成的内容不承担任何责任。另外，请注意，我们的官方网站只有https://github.com/guoyww/AnimateDiff和https://animatediff.github.io，其他所有网站均与AnimateDiff无关。

联系我们

郭宇伟：guoyw@ie.cuhk.edu.hk
杨策元：limbo0066@gmail.com
戴波：doubledaibo@gmail.com

BibTeX

@article{guo2023animatediff,
  title={AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning},
  author={Guo, Yuwei and Yang, Ceyuan and Rao, Anyi and Liang, Zhengyang and Wang, Yaohui and Qiao, Yu and Agrawala, Maneesh and Lin, Dahua and Dai, Bo},
  journal={International Conference on Learning Representations},
  year={2024}
}

@article{guo2023sparsectrl,
  title={SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models},
  author={Guo, Yuwei and Yang, Ceyuan and Rao, Anyi and Agrawala, Maneesh and Lin, Dahua and Dai, Bo},
  journal={arXiv preprint arXiv:2311.16933},
  year={2023}
}