LaVie：使用级联潜在扩散模型生成高质量视频

这个仓库是LaVie的官方PyTorch实现。

LaVie是一个文本到视频(T2V)生成框架，也是视频生成系统Vchitect的主要部分。您还可以查看我们微调的图像到视频(I2V)模型SEINE。

新闻

[2024.07.08]: LaVie-2即将发布，敬请期待！

安装

conda env create -f environment.yml 
conda activate lavie

下载预训练模型

下载预训练的LaVie模型、Stable Diffusion 1.4、stable-diffusion-x4-upscaler到./pretrained_models。您应该能看到以下结构：

├── pretrained_models
│   ├── lavie_base.pt
│   ├── lavie_interpolation.pt
│   ├── lavie_vsr.pt
│   ├── stable-diffusion-v1-4
│   │   ├── ...
└── └── stable-diffusion-x4-upscaler
        ├── ...

画廊：

（画廊内容省略）

随意尝试不同的提示词，并与我们分享您最喜欢的！

推理

推理包含基础T2V、视频插值和视频超分辨率三个步骤。我们提供了几种生成视频的选项：

	步骤1	步骤2	步骤3	分辨率	长度
选项1	✔			320x512	16
选项2	✔	✔		320x512	61
选项3	✔		✔	1280x2048	16
选项4	✔	✔	✔	1280x2048	61

请随意尝试不同的选项 :)

步骤1. 基础T2V

运行以下命令从基础T2V模型生成视频。

cd base
python pipelines/sample.py --config configs/sample.yaml

在configs/sample.yaml中，推理的参数：

ckpt_path: 下载的LaVie基础模型路径，默认为../pretrained_models/lavie_base.pt
pretrained_models: 下载的SD1.4路径，默认为../pretrained_models
output_folder: 保存生成结果的路径，默认为../res/base
seed: 使用的种子，None表示随机生成
sample_method: 使用的调度器，默认为ddpm，选项有ddpm、ddim和eulerdiscrete
guidance_scale: 使用的CFG比例，默认为7.5
num_sampling_steps: 去噪步骤，默认为50
text_prompt: 生成的提示词

以下结果是使用这些参数生成的：

seed: 400, sample_method: ddpm, guidance_scale: 7.0, num_sampling_steps: 50 (在不同设备上可能会得到不同结果)

<table class="center"> <tr> <td><img src="https://yellow-cdn.veclightyear.com/835a84d5/a0c1769b-d6a8-4db3-a9f1-85af1a86ec31.gif"></td> <td><img src="https://yellow-cdn.veclightyear.com/835a84d5/f07b3b62-df21-4ea1-9d89-a2d59a8c81fe.gif"></td> <td><img src="https://yellow-cdn.veclightyear.com/835a84d5/b97174e5-4b33-49b9-acfb-9359e7b97653.gif"></td> </tr> <tr> <td>日出时在公园散步的柯基犬，油画风格</td> <td>熊猫自拍，2K，高质量</td> <td>在纽约时代广场演奏架子鼓的北极熊，4K，高分辨率</td> </tr> <tr> <td><img src="https://yellow-cdn.veclightyear.com/835a84d5/e5dd5c11-894c-4173-9a19-2756482153ba.gif"></td> <td><img src="https://yellow-cdn.veclightyear.com/835a84d5/5d1d92c6-6cdf-42dc-a65a-c93d4647b858.gif"></td> <td><img src="https://yellow-cdn.veclightyear.com/835a84d5/a27a7d7f-16f3-42bf-ab4e-0b80687ef262.gif"></td> </tr> <tr> <td>在清澈的加勒比海中游泳的鲨鱼，2K，高质量</td> <td>在街上行走的泰迪熊，2K，高质量</td> <td>丛林，河流，日落时分，超高质量</td> </tr> </table>

步骤2（可选）。视频插值

运行以下命令进行视频插值。

cd interpolation
python sample.py --config configs/sample.yaml

默认输入视频路径为 ./res/base，结果将保存在 ./res/interpolation 下。在 configs/sample.yaml 中，你可以修改默认的 input_folder 为 YOUR_INPUT_FOLDER。输入视频应命名为 prompt1.mp4、prompt2.mp4 等，并放在 YOUR_INPUT_FOLDER 下。启动代码将处理 input_folder 中的所有输入视频。

步骤3（可选）。视频超分辨率

运行以下命令进行视频超分辨率。

cd vsr
python sample.py --config configs/sample.yaml

默认输入视频路径为 ./res/base，结果将保存在 ./res/vsr 下。你可以在 configs/sample.yaml 中修改默认的 input_path 为 YOUR_INPUT_FOLDER。与步骤2类似，输入视频应命名为 prompt1.mp4、prompt2.mp4 等，并放在 YOUR_INPUT_FOLDER 下。启动代码将处理 input_folder 中的所有输入视频。

BibTex

@article{wang2023lavie,
  title={LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models},
  author={Wang, Yaohui and Chen, Xinyuan and Ma, Xin and Zhou, Shangchen and Huang, Ziqi and Wang, Yi and Yang, Ceyuan and He, Yinan and Yu, Jiashuo and Yang, Peiqing and others},
  journal={arXiv preprint arXiv:2309.15103},
  year={2023}
}

@article{chen2023seine,
title={SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction},
author={Chen, Xinyuan and Wang, Yaohui and Zhang, Lingjun and Zhuang, Shaobin and Ma, Xin and Yu, Jiashuo and Wang, Yali and Lin, Dahua and Qiao, Yu and Liu, Ziwei},
journal={arXiv preprint arXiv:2310.20700},
year={2023}
}

免责声明

我们对用户生成的内容不承担责任。该模型未经训练以逼真地表现人物或事件，因此使用它来生成此类内容超出了模型的能力范围。禁止生成色情、暴力和血腥内容，以及生成贬低或伤害人或其环境、文化、宗教等的内容。用户对自己的行为完全负责。项目贡献者在法律上与用户的行为无关，也不对用户的行为负责。请负责任地使用生成模型，遵守道德和法律标准。