Awesome Dataset Distillation
<img src="https://img.shields.io/badge/Contributions-Welcome-278ea5" alt="Contrib"/> <img src="https://img.shields.io/badge/Number%20of%20Papers-164-FF6F00" alt="PaperNum"/>

Awesome Dataset Distillation provides the most comprehensive and detailed information on the Dataset Distillation field.
Dataset distillation is the task of synthesizing a small dataset such that models trained on it achieve high performance on the original large dataset. A dataset distillation algorithm takes as input a large real dataset to be distilled (training set), and outputs a small synthetic distilled dataset, which is evaluated via testing models trained on this distilled dataset on a separate real dataset (validation/test set). A good small distilled dataset is not only useful in dataset understanding, but has various applications (e.g., continual learning, privacy, neural architecture search, etc.). This task was first introduced in the paper Dataset Distillation [Tongzhou Wang et al., '18], along with a proposed algorithm using backpropagation through optimization steps. Then the task was first extended to the real-world datasets in the paper Medical Dataset Distillation [Guang Li et al., '19], which also explored the privacy preservation possibilities of dataset distillation. In the paper Dataset Condensation [Bo Zhao et al., '20], gradient matching was first introduced and greatly promoted the development of the dataset distillation field.
In recent years (2022-now), dataset distillation has gained increasing attention in the research community, across many institutes and labs. More papers are now being published each year. These wonderful researches have been constantly improving dataset distillation and exploring its various variants and applications.
This project is curated and maintained by Guang Li, Bo Zhao, and Tongzhou Wang.
<img src="./images/logo.jpg" width="20%"/>
- :globe_with_meridians: Project Page
- :octocat: Code
- :book:
bibtex
Latest Updates
- [Call for papers] The First Dataset Distillation Challenge (Kai Wang & Ahmad Sajedi et al., ECCV 2024) :globe_with_meridians: :octocat:
- [2024/08/07] Prioritize Alignment in Dataset Distillation (Zekai Li & Ziyao Guo et al., 2024) :octocat: :book:
- [2024/08/02] Dataset Distillation for Offline Reinforcement Learning (Jonathan Light & Yuanzhe Liu et al., ICML 2024 Workshop) :globe_with_meridians: :octocat: :book:
- [2024/07/29] An Aggregation-Free Federated Learning for Tackling Data Heterogeneity (Yuan Wang et al., CVPR 2024) :book:
- [2024/07/25] Dataset Distillation in Medical Imaging: A Feasibility Study (Muyang Li et al., 2024) :book:
- [2024/07/25] A Theoretical Study of Dataset Distillation (Zachary Izzo et al., NeurIPS 2023 Workshop) :book:
- [2024/07/23] Dataset Distillation by Automatic Training Trajectories (Dai Liu et al., ECCV 2024) :octocat: :book:
- [2024/07/16] FYI: Flip Your Images for Dataset Distillation (Byunggwan Son et al., ECCV 2024) :globe_with_meridians: :book:
- [2024/07/13] Differentially Private Dataset Condensation (Zheng et al., NDSS 2024 Workshop) :book:
- [2024/07/11] Dataset Quantization with Active Learning based Adaptive Sampling (Zhenghao Zhao et al., ECCV 2024) :octocat: :book:
Contents
Main
<a name="early-work" />
Early Work
<a name="gradient-objective" />
Gradient/Trajectory Matching Surrogate Objective
- Dataset Condensation with Gradient Matching (Bo Zhao et al., ICLR 2021) :octocat: :book:
- Dataset Condensation with Differentiable Siamese Augmentation (Bo Zhao et al., ICML 2021) :octocat: :book:
- Dataset Distillation by Matching Training Trajectories (George Cazenavette et al., CVPR 2022) :globe_with_meridians: :octocat: :book:
- Dataset Condensation with Contrastive Signals (Saehyung Lee et al., ICML 2022) :octocat: :book:
- Loss-Curvature Matching for Dataset Selection and Condensation (Seungjae Shin & Heesun Bae et al., AISTATS 2023) :octocat: :book:
- Minimizing the Accumulated Trajectory Error to Improve Dataset Distillation (Jiawei Du & Yidi Jiang et al., CVPR 2023) :octocat: :book:
- Scaling Up Dataset Distillation to ImageNet-1K with Constant Memory (Justin Cui et al., ICML 2023) :octocat: :book:
- Sequential Subset Matching for Dataset Distillation (Jiawei Du et al., NeurIPS 2023) :octocat: :book:
- Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching (Ziyao Guo & Kai Wang et al., ICLR 2024) :globe_with_meridians: :octocat: :book:
- SelMatch: Effectively Scaling Up Dataset Distillation via Selection-Based Initialization and Partial Updates by Trajectory Matching (Yongmin Lee et al., ICML 2024) :octocat: :book:
- Dataset Distillation by Automatic Training Trajectories (Dai Liu et al., ECCV 2024) :octocat: :book:
- Prioritize Alignment in Dataset Distillation (Zekai Li & Ziyao Guo et al., 2024) :octocat: :book:
<a name="feature-objective" />
Distribution/Feature Matching Surrogate Objective
<a name="optimization" />
Better Optimization