Efficient Dataset Distillation using Random Feature Approximation

Authors: Noel Loo, Ramin Hasani, Alexander Amini, Daniela Rus

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We applied our algorithm to five datasets: MNIST, Fashion MNIST, SVHN, CIFAR-10 and CIFAR-100 [Le Cun et al., 2010, Xiao et al., 2017, Netzer et al., 2011, Krizhevsky et al., 2009], distilling the datasets to coresets with 1, 10 or 50 images per class. Table 1 summarizes the results. We observe that in the fixed label configuration, our method outperforms other models in almost every dataset. In particular, it outperforms KIP by up to 6.1% in the CIFAR-10 10 img/cls setting.
Researcher Affiliation Academia Noel Loo, Ramin Hasani, Alexander Amini, Daniela Rus Computer Science and Artificial Intelligence Lab (CSAIL) Massachusetts Institute of Technology (MIT) {loo, rhasani, amini, rus} @mit.edu
Pseudocode Yes Algorithm 1 Dataset distillation with NNGP random features
Open Source Code Yes Code is available at https://github.com/yolky/RFAD
Open Datasets Yes We applied our algorithm to five datasets: MNIST, Fashion MNIST, SVHN, CIFAR-10 and CIFAR-100 [Le Cun et al., 2010, Xiao et al., 2017, Netzer et al., 2011, Krizhevsky et al., 2009]
Dataset Splits Yes Require: Training set and labels XT , y T ... Sample batch from the training set XB, y B ... Unlike typical Platt scaling, we learn τ jointly with our support set instead of post-hoc tuning on a separate validation set.
Hardware Specification No The paper mentions running on "a single GPU" and discusses "GPU hours" but does not specify the exact model or type of GPU, CPU, or other hardware component used for the experiments.
Software Dependencies No The paper mentions using the "neural-tangents library [Novak et al., 2020]" but does not provide specific version numbers for this or any other software component, which is required for reproducibility.
Experiment Setup Yes During training, we used N = 8 random models, each with C = 256 convolutional channels per layer... We consider both the fixed and learned label configurations, with Platt scaling applied and no data augmentation. We used the regularized Zero Component Analysis (ZCA) preprocessing... we used a 1024-width finite network... We find that for these small datasets, this modification significantly improves performance. The second trick is label scaling; we scale the target labels by a factor α > 1... We reran our algorithm on CIFAR-10 and Fashion-MNIST, using either 1, 2, 4, or 8 models during training, using MSE loss or cross-entropy loss.