Dataset Condensation via Efficient Synthetic-Data Parameterization
Authors: Jang-Hyun Kim, Jinuk Kim, Seong Joon Oh, Sangdoo Yun, Hwanjun Song, Joonhyun Jeong, Jung-Woo Ha, Hyun Oh Song
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We propose a unified algorithm that drastically improves the quality of condensed data against the current state-of-the-art on CIFAR-10, Image Net, and Speech Commands. ... In this section, we evaluate the performance of our condensation algorithm over various datasets and tasks. We first evaluate our condensed data from CIFAR-10, Image Netsubset, and Speech Commands by training neural networks from scratch on the condensed data (Krizhevsky et al., 2009; Deng et al., 2009; Warden, 2018). Next, we investigate the proposed algorithm by performing ablation analysis and controlled experiments. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science and Engineering, Seoul National University 2NAVER AI Lab 3University of Tübingen 4Image Vision, NAVER Clova. |
| Pseudocode | Yes | Algorithm 1 Information-Intensive Dataset Condensation |
| Open Source Code | Yes | We release the source code at https://github.com/snu-mllab/Efficient-Dataset-Condensation. |
| Open Datasets | Yes | We first evaluate our condensed data from CIFAR-10, Image Netsubset, and Speech Commands by training neural networks from scratch on the condensed data (Krizhevsky et al., 2009; Deng et al., 2009; Warden, 2018). ... We downloaded the Mini Speech Commands from the official Tensor Flow page. |
| Dataset Splits | No | The paper does not explicitly describe a validation dataset split (e.g., percentages or counts for a validation set) for its experiments. It mentions 'training' and 'test' sets but no distinct 'validation' set split details. |
| Hardware Specification | Yes | We measure training times on an RTX-3090 GPU. |
| Software Dependencies | No | The paper mentions 'TensorFlow page' for Mini Speech Commands, implying the use of TensorFlow, but does not provide specific version numbers for any software or libraries used in the experiments. |
| Experiment Setup | Yes | The other implementation details and hyperparameter settings of our algorithm are described in Appendix C.1. ... In all of the experiments, we fix the number of inner iterations M = 100 (Algorithm 1). For CIFAR-10, we use data learning rate λ = 0.005, network learning rate η = 0.01, and the MSE objective. ... We train neural networks on the condensed data for 1,000 epochs with a 0.01 learning rate. |