A Label is Worth A Thousand Images in Dataset Distillation

Authors: Tian Qin, Zhiwei Deng, David Alvarez-Melis

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through a series of ablation experiments, we study the role of soft labels in depth. Our results reveal that the main factor explaining the performance of state-of-the-art distillation methods is not the specific techniques used to generate synthetic data but rather the use of soft labels.
Researcher Affiliation Collaboration Tian Qin Harvard University Cambridge, MA tqin@g.harvard.edu Zhiwei Deng Google Deep Mind Mountain View, CA zhiweideng@google.com David Alvarez-Melis Harvard University & MSR Cambridge, MA dam@seas.harvard.edu
Pseudocode Yes Algorithm 1 Learn soft label with BPTT
Open Source Code Yes Code for all experiments is available at https://github.com/sunnytqin/no-distillation.
Open Datasets Yes Table 1: Benchmark SOTA methods against Cut Mix baseline and soft label baseline on Image Net-1K. Table 2: Benchmark SOTA methods against soft label baseline ( Sl baseline") on Tiny Image Net, CIFAR-100 and CIFAR-10.
Dataset Splits No The paper does not explicitly specify validation dataset splits or how they were derived from the training data, although it does discuss expert training and hyperparameter tuning which often implies a validation set is used. It references 'standard training recipe' but does not detail the splits.
Hardware Specification Yes All experiments are conducted on NVIDIA A100 SXM4 40GB or NVIDIA H100 80GB HBM3.
Software Dependencies No The paper mentions 'Py Torch' and cites a paper [21] from 2019 about it, implying a version context from that year. However, it does not provide a specific version number (e.g., 'PyTorch 1.9') for the software dependency.
Experiment Setup Yes We follow a standard training recipe to train experts on downsized Image Net-1K, Tiny Image Net, CIFAR-10, and CIFAR-100. This standard training recipe involves an SGD optimizer and a simple step learning rate schedule... Table 7: Hyperparameter list to reproduce soft label baseline results in Table 1 and Table 2.