Dataset Condensation via Efficient Synthetic-Data Parameterization

Authors: Jang-Hyun Kim, Jinuk Kim, Seong Joon Oh, Sangdoo Yun, Hwanjun Song, Joonhyun Jeong, Jung-Woo Ha, Hyun Oh Song

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We propose a unified algorithm that drastically improves the quality of condensed data against the current state-of-the-art on CIFAR-10, Image Net, and Speech Commands. ... In this section, we evaluate the performance of our condensation algorithm over various datasets and tasks. We first evaluate our condensed data from CIFAR-10, Image Netsubset, and Speech Commands by training neural networks from scratch on the condensed data (Krizhevsky et al., 2009; Deng et al., 2009; Warden, 2018). Next, we investigate the proposed algorithm by performing ablation analysis and controlled experiments.
Researcher Affiliation Collaboration 1Department of Computer Science and Engineering, Seoul National University 2NAVER AI Lab 3University of Tübingen 4Image Vision, NAVER Clova.
Pseudocode Yes Algorithm 1 Information-Intensive Dataset Condensation
Open Source Code Yes We release the source code at https://github.com/snu-mllab/Efficient-Dataset-Condensation.
Open Datasets Yes We first evaluate our condensed data from CIFAR-10, Image Netsubset, and Speech Commands by training neural networks from scratch on the condensed data (Krizhevsky et al., 2009; Deng et al., 2009; Warden, 2018). ... We downloaded the Mini Speech Commands from the official Tensor Flow page.
Dataset Splits No The paper does not explicitly describe a validation dataset split (e.g., percentages or counts for a validation set) for its experiments. It mentions 'training' and 'test' sets but no distinct 'validation' set split details.
Hardware Specification Yes We measure training times on an RTX-3090 GPU.
Software Dependencies No The paper mentions 'TensorFlow page' for Mini Speech Commands, implying the use of TensorFlow, but does not provide specific version numbers for any software or libraries used in the experiments.
Experiment Setup Yes The other implementation details and hyperparameter settings of our algorithm are described in Appendix C.1. ... In all of the experiments, we fix the number of inner iterations M = 100 (Algorithm 1). For CIFAR-10, we use data learning rate λ = 0.005, network learning rate η = 0.01, and the MSE objective. ... We train neural networks on the condensed data for 1,000 epochs with a 0.01 learning rate.