VEnhancer

<div align="center"> <h1>VEnhancer: 视频生成的生成式时空增强</h1> <div> <a href='https://scholar.google.com/citations?user=GUxrycUAAAAJ&hl=zh-CN' target='_blank'>何静雯</a>,&emsp; <a href='https://tianfan.info' target='_blank'>薛天凡</a>,&emsp; <a href='https://github.com/ChrisLiu6' target='_blank'>刘东阳</a>,&emsp; <a href='https://github.com/0x3f3f3f3fun' target='_blank'>林鑫琦</a>,&emsp; </div> <a href='https://gaopengcuhk.github.io' target='_blank'>高鹏</a>,&emsp; <a href='https://scholar.google.com/citations?user=GMzzRRUAAAAJ&hl=en' target='_blank'>林达华</a>,&emsp; <a href='https://scholar.google.com/citations?user=gFtI-8QAAAAJ&hl=en' target='_blank'>乔宇</a>,&emsp; <a href='https://wlouyang.github.io' target='_blank'>欧阳万里</a>,&emsp; <a href='https://liuziwei7.github.io' target='_blank'>刘子为</a> <div> </div> <div> 香港中文大学,&emsp;上海人工智能实验室,&emsp; </div> <div> </div> <div> 南洋理工大学S-Lab&emsp; </div> <div> <h4 align="center"> <a href="https://vchitect.github.io/VEnhancer-project/" target='_blank'> <img src="https://img.shields.io/badge/🐳-项目主页-blue"> </a> <a href="https://arxiv.org/abs/2407.07667" target='_blank'> <img src="https://yellow-cdn.veclightyear.com/835a84d5/a4fc93f2-b62d-4aff-8273-8100f7dc412b.svg"> </a> <a href="https://youtu.be/QMR_5weifGg" target='_blank'> <img src="https://yellow-cdn.veclightyear.com/835a84d5/f09b8351-25ea-4ddc-8194-fe4b1cbcc21e.svg?logo=YouTube&logoColor=white">  </h4> </div>

<strong>VEnhancer，一个可以改善现有文本到视频结果的生成式时空增强框架。</strong>

<table class="center"> <tr> <td colspan="1">VideoCrafter2</td> <td colspan="1">+VEnhancer</td> </tr> <tr> <td> <img src=assets/input_raccoon_4.gif width="380"> </td> <td> <img src=assets/out_raccoon_4.gif width="380"> </td> </tr> <tr> <td> <img src=assets/input_fish.gif width="380"> </td> <td> <img src=assets/out_fish.gif width="380"> </td> </tr> </table>

:open_book: 更多视觉效果，请查看我们的<a href="https://vchitect.github.io/VEnhancer-project/" target="_blank">项目主页</a>

</div>

🔥 更新

[2024.07.28] 推理代码和预训练视频增强模型已发布。
[2024.07.10] 创建此仓库。

🎬 概述

VEnhancer的架构。它遵循ControlNet，并复制预训练视频扩散模型的多帧编码器和中间块的架构和权重，以构建可训练的条件网络。这个视频ControlNet接受低分辨率关键帧以及噪声潜在空间的完整帧作为输入。此外，关于噪声增强的噪声级别$\sigma$和下采样因子$s$作为额外的网络条件，除了时间步$t$和提示$c_{text}$之外。整体结构

:gear: 安装

# 克隆此仓库
git clone https://github.com/Vchitect/VEnhancer.git
cd VEnhancer

# 创建环境
conda create -n venhancer python=3.10
conda activate venhancer
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2
pip install -r requirements.txt

注意，ffmpeg命令应该被启用。如果您有sudo权限，可以使用以下命令安装：

sudo apt-get update && apt-get install ffmpeg libsm6 libxext6  -y

:dna: 预训练模型

模型名称	描述	HuggingFace	百度网盘
venhancer_paper.pth	视频增强模型，论文版本	下载	下载

💫 推理

通过open clip下载clip模型，通过sd2.1下载Stable Diffusion的VAE，并下载VEnhancer模型。然后，将这三个检查点放在VEnhancer/ckpts目录中。
运行以下命令。

  bash run_VEnhancer.sh

BibTeX

如果您在研究中使用了我们的工作，请引用我们的出版物：

@article{he2024venhancer,
  title={VEnhancer: Generative Space-Time Enhancement for Video Generation},
  author={He, Jingwen and Xue, Tianfan and Liu, Dongyang and Lin, Xinqi and Gao, Peng and Lin, Dahua and Qiao, Yu and Ouyang, Wanli and Liu, Ziwei},
  journal={arXiv preprint arXiv:2407.07667},
  year={2024}
}