Dataset Distillation with Convexified Implicit Gradients

Authors: Noel Loo, Ramin Hasani, Mathias Lechner, Daniela Rus

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct a large experimental evaluation of our method in a diverse set of dataset distillation tasks and benchmarks and compare its performance to other advanced baselines. In this section, we present our comprehensive experimental evaluation of our method, RCIG, compared to modern baselines using a diverse series of benchmarks and tasks.
Researcher Affiliation Academia Noel Loo 1 Ramin Hasani 1 Mathias Lechner 1 Daniela Rus 1 1Computer Science and Artificial Intelligence Lab (CSAIL), Massachusetts Institute of Technology (MIT).
Pseudocode Yes Algorithm 1 Reparam Convexified Implicit Gradients
Open Source Code Yes 1Code available at https://github.com/yolky/RCIG
Open Datasets Yes We first ran RCIG on six standard benchmarks tests including MNIST (10 classes) (Lecun et al., 1998), Fashion-MNIST (10 classes) (Xiao et al., 2017), CIFAR-10 (10 classes), CIFAR-100 (10 classes) (Krizhevsky, 2009), Tiny-Image Net (200 classes) (Le & Yang, 2015), and Caltech Birds 2011 (200 classes) (Welinder et al., 2010)
Dataset Splits No The paper mentions training on distilled datasets and evaluating on test sets, but it does not specify explicit training/validation/test dataset splits from the original datasets for hyperparameter tuning or model selection in a reproducible manner. While standard datasets often have predefined splits, a specific validation split is not detailed for their experimental procedure.
Hardware Specification Yes We use a mix of Nvidia Titan RTXs with 24Gb, RTX 4090s with 24Gb, and Quadro RTX A6000s with 48Gb VRAM. The training time per iteration plots in Figure 1 and Figure 5 are run on an RTX 4090.
Software Dependencies No Code is implemented using the libraries JAX, Optax, and Flax (Bradbury et al., 2018; Babuschkin et al., 2020; Heek et al., 2020). While specific libraries are mentioned, their version numbers are not provided, which is necessary for reproducibility.
Experiment Setup Yes For the λ L2 regularization term, for depth 3 models we used λ = 0.0005 |S|, for depth 4 we use λ = 0.005 |S| and depth 5 we use λ = 0.05 |S|. For the coreset optimizer, we use Adabelief (Zhuang et al., 2020) optimizer with learning rate 0.003 for the coreset images and labels, and a learning rate of 0.03 for log τ, the Platt scaling loss temperature. For inner optimization and Hessian inverse computation, we use Adam optimizer (Kingma & Ba, 2015), with learning rates αinner and αH 1. During evaluation, we train neural networks for 1000 iterations if |S| = 10, otherwise for 2000 iterations. We used Adam optimizer with a learning rate of 0.0001. In line with prior work (Zhou et al., 2022), we use a learning rate schedule with 500 iterations of linear warm up following be a cosine decay.