Scaling Up Dataset Distillation to ImageNet-1K with Constant Memory
Authors: Justin Cui, Ruochen Wang, Si Si, Cho-Jui Hsieh
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate TESLA on 3 datasets including CIFAR-10/100 (Krizhevsky et al., 2009) and Image Net-1K (Russakovsky et al., 2015) (Appendix A.3). ... The resulting algorithm sets new SOTA on Image Net-1K: we can scale up to 50 IPCs (Image Per Class) on Image Net-1K on a single GPU (all previous methods can only scale to 2 IPCs on Image Net-1K), leading to the best accuracy (only 5.9% accuracy drop against full dataset training) while utilizing only 4.2% of the number of data points an 18.2% absolute gain over prior SOTA. ... We compare TESLA against the random baseline and previous SOTA methods including DSA (Zhao & Bilen, 2021b), DM (Zhao & Bilen, 2021a), KIP (Nguyen et al., 2021), Fre Po (Zhou et al., 2022) and the original MTT. The results are presented in Table 1. |
| Researcher Affiliation | Collaboration | Justin Cui 1 Ruochen Wang 1 Si Si 2 Cho-Jui Hsieh 1 1Department of Computer Science, University of California, Los Angeles 2Google Research. Correspondence to: Justin Cui <justincui@ucla.edu>, Cho-Jui Hsieh <chohsieh@cs.ucla.edu>. |
| Pseudocode | Yes | The proposed algorithm, Traj Ectory matching with Soft Label Assignment (TESLA), which combines the memoryefficient gradient computation of trajectory matching loss and the soft label assignment method, is summarized in Algorithm 1 and Figure 1. |
| Open Source Code | No | The paper discusses open-sourced code in relation to other methods (e.g., KIP, Fre Po) but does not provide an explicit statement or link for the code of their proposed method (TESLA). |
| Open Datasets | Yes | We evaluate TESLA on 3 datasets including CIFAR-10/100 (Krizhevsky et al., 2009) and Image Net-1K (Russakovsky et al., 2015) (Appendix A.3). |
| Dataset Splits | No | The paper mentions '50,000 training and 10,000 testing images' for CIFAR-10/100 and similar for Image Net-1K, indicating standard training and testing splits. However, it does not explicitly state specific validation dataset splits or proportions. |
| Hardware Specification | Yes | All of our experiments are run using one NVIDIA A6000 GPU with 49GB of memory. When measuring the memory consumption used by the original MTT, if it doesn t fit into one GPU, we use two NVIDIA A6000 GPUs just for measuring purpose. |
| Software Dependencies | No | The paper mentions 'Kornia (Riba et al., 2019) ZCA' but does not provide specific version numbers for Kornia or any other software dependencies such as PyTorch. |
| Experiment Setup | Yes | Experiment Settings: We evaluate TESLA on 3 datasets including CIFAR-10/100 (Krizhevsky et al., 2009) and Image Net-1K (Russakovsky et al., 2015) (Appendix A.3). ... For the surrogate model, we use the same Conv Net architecture as DSA/DM/MTT. The model s convolutional layer consists of 128 filters with kernel size 3 3 followed by Instance normalization(Ulyanov et al., 2016), RELU activation and an average pooling layer with kernel size 2 2 and stride 2. ... Table 6: Hyperparameters used to get the distilled dataset. This table lists IPC, Matching Steps, Teacher Epochs, Max Start Epoch, and Batch Size for each dataset. |