Elucidating the Design Space of Dataset Condensation

Authors: Shitong Shao, Zikai Zhou, Huanran Chen, Zhiqiang Shen

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our testing, EDC achieves state-of-the-art accuracy, reaching 48.6% on Image Net-1k with a Res Net-18 model at an IPC of 10, which corresponds to a compression ratio of 0.78%. This performance surpasses those of SRe2L, G-VBSM, and RDED by margins of 27.3%, 17.2%, and 6.6%, respectively. and To validate the effectiveness of our proposed EDC, we conduct comparative experiments across various datasets, including Image Net-1k (Russakovsky et al., 2015), Image Net-10 (Kim et al., 2022), Tiny-Image Net (Tavanaei, 2020), CIFAR-100 (Krizhevsky et al., 2009), and CIFAR-10 (Krizhevsky et al., 2009). Additionally, we explore cross-architecture generalization and ablation studies on Image Net-1k.
Researcher Affiliation Academia Shitong Shao , Zikai Zhou , Huanran Chen , Zhiqiang Shen Mohamed bin Zayed University of AI, Tsinghua University The Hong Kong University of Science and Technology (Guangzhou)
Pseudocode No No pseudocode or algorithm blocks are explicitly labeled or structured in the paper.
Open Source Code Yes Our code is available at: https://github.com/shaoshitong/EDC.
Open Datasets Yes To validate the effectiveness of our proposed EDC, we conduct comparative experiments across various datasets, including Image Net-1k (Russakovsky et al., 2015), Image Net-10 (Kim et al., 2022), Tiny-Image Net (Tavanaei, 2020), CIFAR-100 (Krizhevsky et al., 2009), and CIFAR-10 (Krizhevsky et al., 2009).
Dataset Splits No While the paper mentions 'train', 'validation', and 'test' sets, it does not explicitly provide the specific percentages or sample counts for these splits. It uses standard datasets, which typically have predefined splits, but these are not explicitly detailed in the paper's text or appendix.
Hardware Specification Yes All experiments are conducted using 4 RTX 4090 GPUs.
Software Dependencies No The paper mentions 'torchvision (Paszke et al., 2019)' but does not provide specific version numbers for software dependencies like PyTorch or torchvision.
Experiment Setup Yes We detail the hyperparameter settings of EDC for various datasets, including Image Net-1k, Image Net10, Tiny-Image Net, CIFAR-100, and CIFAR-10, in Tables 6, 7, 8, 9, and 10, respectively. For epochs, a critical factor affecting computational cost, we utilize strategies from SRe2L, G-VBSM, and RDED for Image Net-1k and follow RDED for the other datasets. In the data synthesis phase, we reduce the iteration count of hyperparameters by half compared to those used in SRe2L and G-VBSM.