Efficient Dataset Distillation using Random Feature Approximation
Authors: Noel Loo, Ramin Hasani, Alexander Amini, Daniela Rus
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We applied our algorithm to five datasets: MNIST, Fashion MNIST, SVHN, CIFAR-10 and CIFAR-100 [Le Cun et al., 2010, Xiao et al., 2017, Netzer et al., 2011, Krizhevsky et al., 2009], distilling the datasets to coresets with 1, 10 or 50 images per class. Table 1 summarizes the results. We observe that in the fixed label configuration, our method outperforms other models in almost every dataset. In particular, it outperforms KIP by up to 6.1% in the CIFAR-10 10 img/cls setting. |
| Researcher Affiliation | Academia | Noel Loo, Ramin Hasani, Alexander Amini, Daniela Rus Computer Science and Artificial Intelligence Lab (CSAIL) Massachusetts Institute of Technology (MIT) {loo, rhasani, amini, rus} @mit.edu |
| Pseudocode | Yes | Algorithm 1 Dataset distillation with NNGP random features |
| Open Source Code | Yes | Code is available at https://github.com/yolky/RFAD |
| Open Datasets | Yes | We applied our algorithm to five datasets: MNIST, Fashion MNIST, SVHN, CIFAR-10 and CIFAR-100 [Le Cun et al., 2010, Xiao et al., 2017, Netzer et al., 2011, Krizhevsky et al., 2009] |
| Dataset Splits | Yes | Require: Training set and labels XT , y T ... Sample batch from the training set XB, y B ... Unlike typical Platt scaling, we learn τ jointly with our support set instead of post-hoc tuning on a separate validation set. |
| Hardware Specification | No | The paper mentions running on "a single GPU" and discusses "GPU hours" but does not specify the exact model or type of GPU, CPU, or other hardware component used for the experiments. |
| Software Dependencies | No | The paper mentions using the "neural-tangents library [Novak et al., 2020]" but does not provide specific version numbers for this or any other software component, which is required for reproducibility. |
| Experiment Setup | Yes | During training, we used N = 8 random models, each with C = 256 convolutional channels per layer... We consider both the fixed and learned label configurations, with Platt scaling applied and no data augmentation. We used the regularized Zero Component Analysis (ZCA) preprocessing... we used a 1024-width finite network... We find that for these small datasets, this modification significantly improves performance. The second trick is label scaling; we scale the target labels by a factor α > 1... We reran our algorithm on CIFAR-10 and Fashion-MNIST, using either 1, 2, 4, or 8 models during training, using MSE loss or cross-entropy loss. |