<h1 align="center">A Collection of Video Generation Studies</h1>

This GitHub repository summarizes papers and resources related to the video generation task.

If you have any suggestions about this repository, please feel free to start a new issue or pull requests.

Recent news of this GitHub repo are listed as follows.

<details> <summary> 🔥 Click to see more information. </summary>

[Jun. 17th] All NeurIPS 2023 papers and references are updated.
[Apr. 26th] Update a new direction: Personalized Video Generation.
[Mar. 28th] The official AAAI 2024 paper list are released! Official version of PDFs and BibTeX references are updated accordingly.

</details>

Contents

To-Do Lists
Products
Papers
Datasets
Q&A
References
Star History

To-Do Lists

Latest Papers
- Update ECCV 2024 Papers
- Update CVPR 2024 Papers
  - Update PDFs and References of ⚠️ Papers
  - Update Published Versions of References
- Update AAAI 2024 Papers
  - Update PDFs and References of ⚠️ Papers
  - Update Published Versions of References
- Update ICLR 2024 Papers
- Update NeurIPS 2023 Papers
Previously Published Papers
- Update Previous CVPR papers
- Update Previous ICCV papers
- Update Previous ECCV papers
- Update Previous NeurIPS papers
- Update Previous ICLR papers
- Update Previous AAAI papers
- Update Previous ACM MM papers
Regular Maintenance of Preprint arXiv Papers and Missed Papers

<🎯Back to Top>

Products

Name	Organization	Year	Research Paper	Website	Specialties
Sora	OpenAI	2024	link	link	-
Lumiere	Google	2024	link	link	-
VideoPoet	Google	2023	-	link	-
W.A.I.T	Google	2023	link	link	-
Gen-2	Runaway	2023	-	link	-
Gen-1	Runaway	2023	-	link	-
Animate Anyone	Alibaba	2023	link	link	-
Outfit Anyone	Alibaba	2023	-	link	-
Stable Video	StabilityAI	2023	link	link	-
Pixeling	HiDream.ai	2023	-	link	-
DomoAI	DomoAI	2023	-	link	-
Emu	Meta	2023	link	link	-
Genmo	Genmo	2023	-	link	-
NeverEnds	NeverEnds	2023	-	link	-
Moonvalley	Moonvalley	2023	-	link	-
Morph Studio	Morph	2023	-	link	-
Pika	Pika	2023	-	link	-
PixelDance	ByteDance	2023	link	link	-

<🎯Back to Top>

Papers

Survey Papers

Year 2024
arXiv
- Video Diffusion Models: A Survey [Paper]
Year 2023
arXiv
- A Survey on Video Diffusion Models [Paper]

Text-to-Video Generation

Year 2024
- CVPR
 - Vlogger: Make Your Dream A Vlog [Paper] [Code]
 - Make Pixels Dance: High-Dynamic Video Generation [Paper] [Project] [Demo]
 - VGen: Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation [Paper] [Code] [Project]
 - GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation [Paper] [Project]
 - SimDA: Simple Diffusion Adapter for Efficient Video Generation [Paper] [Code] [Project]
 - MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation [Paper] [Project] [Video]
 - Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models [Paper] [Project]
 - PEEKABOO: Interactive Video Generation via Masked-Diffusion [Paper] [Code] [Project] [Demo]
 - EvalCrafter: Benchmarking and Evaluating Large Video Generation Models [Paper] [Code] [Project]
 - A Recipe for Scaling up Text-to-Video Generation with Text-free Videos [Paper] [Code] [Project]
 - BIVDiff: A Training-free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models [Paper] [Project]
 - Mind the Time: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis [Paper] [Project]
 - Animate Anyone: Consistent and Controllable Image-to-video Synthesis for Character Animation [Paper] [Code] [Project]
 - MotionDirector: Motion Customization of Text-to-Video Diffusion Models [Paper] [Code]
 - Hierarchical Patch-wise Diffusion Models for High-Resolution Video Generation [Paper] [Project]
 - DiffPerformer: Iterative Learning of Consistent Latent Guidance for Diffusion-based Human Video Generation [Paper] [Code]
 - Grid Diffusion Models for Text-to-Video Generation [Paper] [Code] [Video]
- ICLR
 - VDT: General-purpose Video Diffusion Transformers via Mask Modeling [Paper] [Code] [Project]
 - VersVideo: Leveraging Enhanced Temporal Diffusion Models for Versatile Video Generation [Paper]
- AAAI
 - Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos [Paper] [Code] [Project]
 - E2HQV: High-Quality Video Generation from Event Camera via Theory-Inspired Model-Aided Deep Learning [Paper]
 - ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation [Paper] [Code] [Project]
 - F3-Pruning: A Training-Free and Generalized Pruning Strategy towards Faster and Finer Text to-Video Synthesis [Paper]
- arXiv
 - Lumiere: A Space-Time Diffusion Model for Video Generation [Paper] [Project]
 - Boximator: Generating Rich and Controllable Motions for Video Synthesis [Paper] [Project] [Video]
 - World Model on Million-Length Video And Language With RingAttention [Paper] [Code] [Project]
 - Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion [Paper] [Project]
 - WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens [Paper] [Code] [Project]
 - MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation [Paper] [Project]
 - Latte: Latent Diffusion Transformer for Video Generation [Paper] [Code] [Project]
 - Mora: Enabling Generalist Video Generation via A Multi-Agent Framework [Paper] [Code]
 - StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text [Paper] [Code] [Project] [Video]
 - VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models [Paper]
 - StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation [Paper] [Code] [Project] [Demo]
 - Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model [Paper] [Code] [Project]
- Others
 - Sora: Video Generation Models as World Simulators [Paper]
Year 2023
- CVPR
 - Align your Latents: High-resolution Video Synthesis with Latent Diffusion Models [Paper] [Project] [Reproduced code]
 - Text2Video-Zero: Text-to-image Diffusion Models are Zero-shot Video Generators