A Label is Worth A Thousand Images in Dataset Distillation
Authors: Tian Qin, Zhiwei Deng, David Alvarez-Melis
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through a series of ablation experiments, we study the role of soft labels in depth. Our results reveal that the main factor explaining the performance of state-of-the-art distillation methods is not the specific techniques used to generate synthetic data but rather the use of soft labels. |
| Researcher Affiliation | Collaboration | Tian Qin Harvard University Cambridge, MA tqin@g.harvard.edu Zhiwei Deng Google Deep Mind Mountain View, CA zhiweideng@google.com David Alvarez-Melis Harvard University & MSR Cambridge, MA dam@seas.harvard.edu |
| Pseudocode | Yes | Algorithm 1 Learn soft label with BPTT |
| Open Source Code | Yes | Code for all experiments is available at https://github.com/sunnytqin/no-distillation. |
| Open Datasets | Yes | Table 1: Benchmark SOTA methods against Cut Mix baseline and soft label baseline on Image Net-1K. Table 2: Benchmark SOTA methods against soft label baseline ( Sl baseline") on Tiny Image Net, CIFAR-100 and CIFAR-10. |
| Dataset Splits | No | The paper does not explicitly specify validation dataset splits or how they were derived from the training data, although it does discuss expert training and hyperparameter tuning which often implies a validation set is used. It references 'standard training recipe' but does not detail the splits. |
| Hardware Specification | Yes | All experiments are conducted on NVIDIA A100 SXM4 40GB or NVIDIA H100 80GB HBM3. |
| Software Dependencies | No | The paper mentions 'Py Torch' and cites a paper [21] from 2019 about it, implying a version context from that year. However, it does not provide a specific version number (e.g., 'PyTorch 1.9') for the software dependency. |
| Experiment Setup | Yes | We follow a standard training recipe to train experts on downsized Image Net-1K, Tiny Image Net, CIFAR-10, and CIFAR-100. This standard training recipe involves an SGD optimizer and a simple step learning rate schedule... Table 7: Hyperparameter list to reproduce soft label baseline results in Table 1 and Table 2. |