<p align="center">
<h1 align="center">A Collection of Video Generation Studies</h1>
This GitHub repository summarizes papers and resources related to the video generation task.
If you have any suggestions about this repository, please feel free to start a new issue or pull requests.
Recent news of this GitHub repo are listed as follows.
<details> <summary> 🔥 Click to see more information. </summary>
- [Jun. 17th] All NeurIPS 2023 papers and references are updated.
- [Apr. 26th] Update a new direction: Personalized Video Generation.
- [Mar. 28th] The official AAAI 2024 paper list are released! Official version of PDFs and BibTeX references are updated accordingly.
</details>
<!-- omit in toc -->
<span id="contents">Contents</span>
<!-- omit in toc -->
To-Do Lists
- Latest Papers
- Previously Published Papers
- Regular Maintenance of Preprint arXiv Papers and Missed Papers
<u><small><🎯Back to Top></small></u>
<!-- omit in toc -->
Products
Name | Organization | Year | Research Paper | Website | Specialties |
---|
Sora | OpenAI | 2024 | link | link | - |
Lumiere | Google | 2024 | link | link | - |
VideoPoet | Google | 2023 | - | link | - |
W.A.I.T | Google | 2023 | link | link | - |
Gen-2 | Runaway | 2023 | - | link | - |
Gen-1 | Runaway | 2023 | - | link | - |
Animate Anyone | Alibaba | 2023 | link | link | - |
Outfit Anyone | Alibaba | 2023 | - | link | - |
Stable Video | StabilityAI | 2023 | link | link | - |
Pixeling | HiDream.ai | 2023 | - | link | - |
DomoAI | DomoAI | 2023 | - | link | - |
Emu | Meta | 2023 | link | link | - |
Genmo | Genmo | 2023 | - | link | - |
NeverEnds | NeverEnds | 2023 | - | link | - |
Moonvalley | Moonvalley | 2023 | - | link | - |
Morph Studio | Morph | 2023 | - | link | - |
Pika | Pika | 2023 | - | link | - |
PixelDance | ByteDance | 2023 | link | link | - |
<u><small><🎯Back to Top></small></u>
<!-- omit in toc -->
Papers
<!-- omit in toc -->
Survey Papers
- <span id="survey-year-2023">Year 2024</span>
- arXiv
- Video Diffusion Models: A Survey [Paper]
- <span id="survey-year-2023">Year 2023</span>
- arXiv
- A Survey on Video Diffusion Models [Paper]
<!-- omit in toc -->
Text-to-Video Generation
- <span id="text-year-2024">Year 2024</span>
- CVPR
- Vlogger: Make Your Dream A Vlog [Paper] [Code]
- Make Pixels Dance: High-Dynamic Video Generation [Paper] [Project] [Demo]
- VGen: Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation [Paper] [Code] [Project]
- GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation [Paper] [Project]
- SimDA: Simple Diffusion Adapter for Efficient Video Generation [Paper] [Code] [Project]
- MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation [Paper] [Project] [Video]
- Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models [Paper] [Project]
- PEEKABOO: Interactive Video Generation via Masked-Diffusion [Paper] [Code] [Project] [Demo]
- EvalCrafter: Benchmarking and Evaluating Large Video Generation Models [Paper] [Code] [Project]
- A Recipe for Scaling up Text-to-Video Generation with Text-free Videos [Paper] [Code] [Project]
- BIVDiff: A Training-free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models [Paper] [Project]
- Mind the Time: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis [Paper] [Project]
- Animate Anyone: Consistent and Controllable Image-to-video Synthesis for Character Animation [Paper] [Code] [Project]
- MotionDirector: Motion Customization of Text-to-Video Diffusion Models [Paper] [Code]
- Hierarchical Patch-wise Diffusion Models for High-Resolution Video Generation [Paper] [Project]
- DiffPerformer: Iterative Learning of Consistent Latent Guidance for Diffusion-based Human Video Generation [Paper] [Code]
- Grid Diffusion Models for Text-to-Video Generation [Paper] [Code] [Video]
- ICLR
- VDT: General-purpose Video Diffusion Transformers via Mask Modeling [Paper] [Code] [Project]
- VersVideo: Leveraging Enhanced Temporal Diffusion Models for Versatile Video Generation [Paper]
- AAAI
- Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos [Paper] [Code] [Project]
- E2HQV: High-Quality Video Generation from Event Camera via Theory-Inspired Model-Aided Deep Learning [Paper]
- ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation [Paper] [Code] [Project]
- F3-Pruning: A Training-Free and Generalized Pruning Strategy towards Faster and Finer Text to-Video Synthesis [Paper]
- arXiv
- Lumiere: A Space-Time Diffusion Model for Video Generation [Paper] [Project]
- Boximator: Generating Rich and Controllable Motions for Video Synthesis [Paper] [Project] [Video]
- World Model on Million-Length Video And Language With RingAttention [Paper] [Code] [Project]
- Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion [Paper] [Project]
- WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens [Paper] [Code] [Project]
- MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation [Paper] [Project]
- Latte: Latent Diffusion Transformer for Video Generation [Paper] [Code] [Project]
- Mora: Enabling Generalist Video Generation via A Multi-Agent Framework [Paper] [Code]
- StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text [Paper] [Code] [Project] [Video]
- VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models [Paper]
- StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation [Paper] [Code] [Project] [Demo]
- Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model [Paper] [Code] [Project]
- Others
- Sora: Video Generation Models as World Simulators [Paper]
- <span id="text-year-2023">Year 2023</span>
- CVPR
- Align your Latents: High-resolution Video Synthesis with Latent Diffusion Models [Paper] [Project] [Reproduced code]
- Text2Video-Zero: Text-to-image Diffusion Models are Zero-shot Video Generators