Embarrassingly Simple Dataset Distillation
Authors: Yunzhen Feng, Shanmukha Ramakrishna Vedantam, Julia Kempe
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We run experiments on four standard datasets, CIFAR-10 (10 classes, 32 32), CIFAR-100 (100 classes, 32 32, Krizhevsky et al. (2009)), Caltech Birds 2011 (200 classes, CUB200, 32 32, Wah et al. (2011)), Tiny-Image Net (200 classes, 64 64, Le & Yang (2015) ), Image Net-1K (1,000 classes, 64 64, Deng et al. (2009)). |
| Researcher Affiliation | Academia | Yunzhen Feng Ramakrishna Vedantam Julia Kempe Center for Data Science, New York University Courant Institue of Mathematical Sciences, New York University yf2231@nyu.edu |
| Pseudocode | Yes | Algorithm 1 Dataset Distillation with Ra T-BPTT. |
| Open Source Code | Yes | We release our code at https://github.com/fengyzpku/Simple_Dataset_Distillation. |
| Open Datasets | Yes | Datasets We run experiments on four standard datasets, CIFAR-10 (10 classes, 32 32), CIFAR-100 (100 classes, 32 32, Krizhevsky et al. (2009)), Caltech Birds 2011 (200 classes, CUB200, 32 32, Wah et al. (2011)), Tiny-Image Net (200 classes, 64 64, Le & Yang (2015) ), Image Net-1K (1,000 classes, 64 64, Deng et al. (2009)). |
| Dataset Splits | Yes | Parameters such as unrolling length and window size are determined via a validation set. |
| Hardware Specification | Yes | We have conducted a comparative analysis of the total training time for several methods, utilizing a consistent computational environment on an RTX8000 with 48GB. |
| Software Dependencies | No | The paper mentions "Higher package (Grefenstette et al., 2019)" and "Adam" as software used, but does not provide specific version numbers for these components. |
| Experiment Setup | Yes | We opt for a simple setup: using Adam for inner optimization with a learning rate of 0.001, and applying standard augmentations (flip and rotation) on the target set. Parameters such as unrolling length and window size are determined via a validation set. ... We utilize the Adam optimizer for both the inner loop (network unrolling) and the outer loop (distilled dataset optimization) with learning rates uniformly set to 0.001 for CIFAR-10, CIFAR-100, and CUB-200, and to 0.0003 for Tiny-Image Net and Image Net-1K. ... We maintain a window size to unrolling length ratio of around 1:3. ... batch sizes of 5,000 for CIFAR-10 and CIFAR-100, 3,000 for CUB-200, 1,000 for Tiny-Image Net, and 1,500 for Image Net-1K. |