Embarrassingly Simple Dataset Distillation

Authors: Yunzhen Feng, Shanmukha Ramakrishna Vedantam, Julia Kempe

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We run experiments on four standard datasets, CIFAR-10 (10 classes, 32 32), CIFAR-100 (100 classes, 32 32, Krizhevsky et al. (2009)), Caltech Birds 2011 (200 classes, CUB200, 32 32, Wah et al. (2011)), Tiny-Image Net (200 classes, 64 64, Le & Yang (2015) ), Image Net-1K (1,000 classes, 64 64, Deng et al. (2009)).
Researcher Affiliation Academia Yunzhen Feng Ramakrishna Vedantam Julia Kempe Center for Data Science, New York University Courant Institue of Mathematical Sciences, New York University yf2231@nyu.edu
Pseudocode Yes Algorithm 1 Dataset Distillation with Ra T-BPTT.
Open Source Code Yes We release our code at https://github.com/fengyzpku/Simple_Dataset_Distillation.
Open Datasets Yes Datasets We run experiments on four standard datasets, CIFAR-10 (10 classes, 32 32), CIFAR-100 (100 classes, 32 32, Krizhevsky et al. (2009)), Caltech Birds 2011 (200 classes, CUB200, 32 32, Wah et al. (2011)), Tiny-Image Net (200 classes, 64 64, Le & Yang (2015) ), Image Net-1K (1,000 classes, 64 64, Deng et al. (2009)).
Dataset Splits Yes Parameters such as unrolling length and window size are determined via a validation set.
Hardware Specification Yes We have conducted a comparative analysis of the total training time for several methods, utilizing a consistent computational environment on an RTX8000 with 48GB.
Software Dependencies No The paper mentions "Higher package (Grefenstette et al., 2019)" and "Adam" as software used, but does not provide specific version numbers for these components.
Experiment Setup Yes We opt for a simple setup: using Adam for inner optimization with a learning rate of 0.001, and applying standard augmentations (flip and rotation) on the target set. Parameters such as unrolling length and window size are determined via a validation set. ... We utilize the Adam optimizer for both the inner loop (network unrolling) and the outer loop (distilled dataset optimization) with learning rates uniformly set to 0.001 for CIFAR-10, CIFAR-100, and CUB-200, and to 0.0003 for Tiny-Image Net and Image Net-1K. ... We maintain a window size to unrolling length ratio of around 1:3. ... batch sizes of 5,000 for CIFAR-10 and CIFAR-100, 3,000 for CUB-200, 1,000 for Tiny-Image Net, and 1,500 for Image Net-1K.