Scaling Up Dataset Distillation to ImageNet-1K with Constant Memory

Authors: Justin Cui, Ruochen Wang, Si Si, Cho-Jui Hsieh

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate TESLA on 3 datasets including CIFAR-10/100 (Krizhevsky et al., 2009) and Image Net-1K (Russakovsky et al., 2015) (Appendix A.3). ... The resulting algorithm sets new SOTA on Image Net-1K: we can scale up to 50 IPCs (Image Per Class) on Image Net-1K on a single GPU (all previous methods can only scale to 2 IPCs on Image Net-1K), leading to the best accuracy (only 5.9% accuracy drop against full dataset training) while utilizing only 4.2% of the number of data points an 18.2% absolute gain over prior SOTA. ... We compare TESLA against the random baseline and previous SOTA methods including DSA (Zhao & Bilen, 2021b), DM (Zhao & Bilen, 2021a), KIP (Nguyen et al., 2021), Fre Po (Zhou et al., 2022) and the original MTT. The results are presented in Table 1.
Researcher Affiliation Collaboration Justin Cui 1 Ruochen Wang 1 Si Si 2 Cho-Jui Hsieh 1 1Department of Computer Science, University of California, Los Angeles 2Google Research. Correspondence to: Justin Cui <justincui@ucla.edu>, Cho-Jui Hsieh <chohsieh@cs.ucla.edu>.
Pseudocode Yes The proposed algorithm, Traj Ectory matching with Soft Label Assignment (TESLA), which combines the memoryefficient gradient computation of trajectory matching loss and the soft label assignment method, is summarized in Algorithm 1 and Figure 1.
Open Source Code No The paper discusses open-sourced code in relation to other methods (e.g., KIP, Fre Po) but does not provide an explicit statement or link for the code of their proposed method (TESLA).
Open Datasets Yes We evaluate TESLA on 3 datasets including CIFAR-10/100 (Krizhevsky et al., 2009) and Image Net-1K (Russakovsky et al., 2015) (Appendix A.3).
Dataset Splits No The paper mentions '50,000 training and 10,000 testing images' for CIFAR-10/100 and similar for Image Net-1K, indicating standard training and testing splits. However, it does not explicitly state specific validation dataset splits or proportions.
Hardware Specification Yes All of our experiments are run using one NVIDIA A6000 GPU with 49GB of memory. When measuring the memory consumption used by the original MTT, if it doesn t fit into one GPU, we use two NVIDIA A6000 GPUs just for measuring purpose.
Software Dependencies No The paper mentions 'Kornia (Riba et al., 2019) ZCA' but does not provide specific version numbers for Kornia or any other software dependencies such as PyTorch.
Experiment Setup Yes Experiment Settings: We evaluate TESLA on 3 datasets including CIFAR-10/100 (Krizhevsky et al., 2009) and Image Net-1K (Russakovsky et al., 2015) (Appendix A.3). ... For the surrogate model, we use the same Conv Net architecture as DSA/DM/MTT. The model s convolutional layer consists of 128 filters with kernel size 3 3 followed by Instance normalization(Ulyanov et al., 2016), RELU activation and an average pooling layer with kernel size 2 2 and stride 2. ... Table 6: Hyperparameters used to get the distilled dataset. This table lists IPC, Matching Steps, Teacher Epochs, Max Start Epoch, and Batch Size for each dataset.