Elucidating the Design Space of Dataset Condensation
Authors: Shitong Shao, Zikai Zhou, Huanran Chen, Zhiqiang Shen
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our testing, EDC achieves state-of-the-art accuracy, reaching 48.6% on Image Net-1k with a Res Net-18 model at an IPC of 10, which corresponds to a compression ratio of 0.78%. This performance surpasses those of SRe2L, G-VBSM, and RDED by margins of 27.3%, 17.2%, and 6.6%, respectively. and To validate the effectiveness of our proposed EDC, we conduct comparative experiments across various datasets, including Image Net-1k (Russakovsky et al., 2015), Image Net-10 (Kim et al., 2022), Tiny-Image Net (Tavanaei, 2020), CIFAR-100 (Krizhevsky et al., 2009), and CIFAR-10 (Krizhevsky et al., 2009). Additionally, we explore cross-architecture generalization and ablation studies on Image Net-1k. |
| Researcher Affiliation | Academia | Shitong Shao , Zikai Zhou , Huanran Chen , Zhiqiang Shen Mohamed bin Zayed University of AI, Tsinghua University The Hong Kong University of Science and Technology (Guangzhou) |
| Pseudocode | No | No pseudocode or algorithm blocks are explicitly labeled or structured in the paper. |
| Open Source Code | Yes | Our code is available at: https://github.com/shaoshitong/EDC. |
| Open Datasets | Yes | To validate the effectiveness of our proposed EDC, we conduct comparative experiments across various datasets, including Image Net-1k (Russakovsky et al., 2015), Image Net-10 (Kim et al., 2022), Tiny-Image Net (Tavanaei, 2020), CIFAR-100 (Krizhevsky et al., 2009), and CIFAR-10 (Krizhevsky et al., 2009). |
| Dataset Splits | No | While the paper mentions 'train', 'validation', and 'test' sets, it does not explicitly provide the specific percentages or sample counts for these splits. It uses standard datasets, which typically have predefined splits, but these are not explicitly detailed in the paper's text or appendix. |
| Hardware Specification | Yes | All experiments are conducted using 4 RTX 4090 GPUs. |
| Software Dependencies | No | The paper mentions 'torchvision (Paszke et al., 2019)' but does not provide specific version numbers for software dependencies like PyTorch or torchvision. |
| Experiment Setup | Yes | We detail the hyperparameter settings of EDC for various datasets, including Image Net-1k, Image Net10, Tiny-Image Net, CIFAR-100, and CIFAR-10, in Tables 6, 7, 8, 9, and 10, respectively. For epochs, a critical factor affecting computational cost, we utilize strategies from SRe2L, G-VBSM, and RDED for Image Net-1k and follow RDED for the other datasets. In the data synthesis phase, we reduce the iteration count of hyperparameters by half compared to those used in SRe2L and G-VBSM. |