M3D: Dataset Condensation by Minimizing Maximum Mean Discrepancy
Authors: Hansong Zhang, Shikun Li, Pengju Wang, Dan Zeng, Shiming Ge
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive analysis is conducted to verify the effectiveness of the proposed method. Source codes are available at https://github.com/Hansong Zhang/M3D. Experiments In this section, we begin by comparing our proposed M3D with SOTA baselines on multiple benchmark datasets. Subsequently, we conduct an in-depth examination of M3D through ablation analysis. |
| Researcher Affiliation | Academia | Hansong Zhang1, 2*, Shikun Li1, 2*, Pengju Wang1, 2, Dan Zeng3, Shiming Ge1, 2 1Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100092, China 2School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China 3Department of Communication Engineering, Shanghai University, Shanghai 200040, China |
| Pseudocode | Yes | The pseudo-code of M3D is provided in the Appendix. |
| Open Source Code | Yes | Source codes are available at https://github.com/Hansong Zhang/M3D. |
| Open Datasets | Yes | Our evaluation encompasses five low-resolution datasets: MNIST (Le Cun et al. 1998), Fashion-MNIST (FMNIST) (Xiao, Rasul, and Vollgraf 2017), SVHN (Netzer et al. 2011), CIFAR-10 (Krizhevsky, Hinton et al. 2009), and CIFAR-100 (Krizhevsky, Hinton et al. 2009). In addition, we also conduct experiments on the high-resolution dataset Image Net subsets (Deng et al. 2009). |
| Dataset Splits | No | The paper does not provide explicit details about train/validation/test splits, nor does it explicitly mention a 'validation set' used for hyperparameter tuning. It only describes the training process and evaluation on a test set. |
| Hardware Specification | Yes | The minimal time required for obtaining the best performance is reported, which is measured on a single RTX-A6000 GPU with same batch size. |
| Software Dependencies | No | The paper does not explicitly state specific software dependencies with their version numbers (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | The number of iterations is set to 10K for all low-resolution datasets. While for Image Net subsets, we set 1K iterations. Additionally, the number of iterations per model is consistently set to 5 across all datasets. Regarding the learning rates for the condensed data, we assign a value of 1 for low-resolution datasets including F-MNIST, SVHN and CIFAR-10/100. For Image Net subsets, we adopt a learning rate of 1e-1. Following IDC (Kim et al. 2022), the factor parameter l is set to 2 for low-resolution datasets and 3 for Image Net subsets. |