reproducibilityindex.ai

Dataset Condensation via Efficient Synthetic-Data Parameterization

Authors: Jang-Hyun Kim, Jinuk Kim, Seong Joon Oh, Sangdoo Yun, Hwanjun Song, Joonhyun Jeong, Jung-Woo Ha, Hyun Oh Song

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We propose a unified algorithm that drastically improves the quality of condensed data against the current state-of-the-art on CIFAR-10, Image Net, and Speech Commands. ... In this section, we evaluate the performance of our condensation algorithm over various datasets and tasks. We first evaluate our condensed data from CIFAR-10, Image Netsubset, and Speech Commands by training neural networks from scratch on the condensed data (Krizhevsky et al., 2009; Deng et al., 2009; Warden, 2018). Next, we investigate the proposed algorithm by performing ablation analysis and controlled experiments.
Researcher Affiliation	Collaboration	1Department of Computer Science and Engineering, Seoul National University 2NAVER AI Lab 3University of Tübingen 4Image Vision, NAVER Clova.
Pseudocode	Yes	Algorithm 1 Information-Intensive Dataset Condensation
Open Source Code	Yes	We release the source code at https://github.com/snu-mllab/Efficient-Dataset-Condensation.
Open Datasets	Yes	We first evaluate our condensed data from CIFAR-10, Image Netsubset, and Speech Commands by training neural networks from scratch on the condensed data (Krizhevsky et al., 2009; Deng et al., 2009; Warden, 2018). ... We downloaded the Mini Speech Commands from the official Tensor Flow page.
Dataset Splits	No	The paper does not explicitly describe a validation dataset split (e.g., percentages or counts for a validation set) for its experiments. It mentions 'training' and 'test' sets but no distinct 'validation' set split details.
Hardware Specification	Yes	We measure training times on an RTX-3090 GPU.
Software Dependencies	No	The paper mentions 'TensorFlow page' for Mini Speech Commands, implying the use of TensorFlow, but does not provide specific version numbers for any software or libraries used in the experiments.
Experiment Setup	Yes	The other implementation details and hyperparameter settings of our algorithm are described in Appendix C.1. ... In all of the experiments, we fix the number of inner iterations M = 100 (Algorithm 1). For CIFAR-10, we use data learning rate λ = 0.005, network learning rate η = 0.01, and the MSE objective. ... We train neural networks on the condensed data for 1,000 epochs with a 0.01 learning rate.