DANCE: Dual-View Distribution Alignment for Dataset Condensation
Authors: Hansong Zhang, Shikun Li, Fanzhao Lin, Weiping Wang, Zhenxing Qian, Shiming Ge
IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate the proposed method achieves a SOTA performance while maintaining comparable efficiency with the original DM across various scenarios. Source codes are available at https: //github.com/Hansong-Zhang/DANCE. 4 Experiments 4.1 Experimental Setup Datasets. We assess our method using three low-resolution datasets: Fashion-MNIST [Xiao et al., 2017] with a resolution of 28 28, and CIFAR-10/100 [Krizhevsky, 2009] with a resolution of 32 32. For medium-resolution data, we utilize the resized Tiny Image NET [Le and Yang, 2015], which has a resolution of 64 64. Furthermore, in alignment with MTT [Cazenavette et al., 2022], we employ various subsets of the high-resolution Image Net-1K [Deng et al., 2009] dataset (resolution 128 128) in our experiments. |
| Researcher Affiliation | Academia | Hansong Zhang1,2 , Shikun Li1,2 , Fanzhao Lin1,2 , Weiping Wang1,2 Zhenxing Qian3 , Shiming Ge1,2 1Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100092, China 2School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China 3School of Computer Science, Fudan University, Shanghai 200433, China {zhanghansong,lishikun,linfanzhao,wangweiping,geshiming}@iie.ac.cn, zxqian@fudan.edu.cn |
| Pseudocode | Yes | Algorithm 1 Dual-View Distribution Alignment for Dataset Condensation |
| Open Source Code | Yes | Source codes are available at https: //github.com/Hansong-Zhang/DANCE. |
| Open Datasets | Yes | Datasets. We assess our method using three low-resolution datasets: Fashion-MNIST [Xiao et al., 2017] with a resolution of 28 28, and CIFAR-10/100 [Krizhevsky, 2009] with a resolution of 32 32. For medium-resolution data, we utilize the resized Tiny Image NET [Le and Yang, 2015], which has a resolution of 64 64. Furthermore, in alignment with MTT [Cazenavette et al., 2022], we employ various subsets of the high-resolution Image Net-1K [Deng et al., 2009] dataset (resolution 128 128) in our experiments. |
| Dataset Splits | No | The paper mentions training networks and evaluating test accuracy but does not specify a separate validation dataset split or its details. |
| Hardware Specification | Yes | The experiments are conducted on a GPU group comprising GTX 3090, RTX-2080, and NVIDIA-A100 GPUs. |
| Software Dependencies | No | The paper mentions using an SGD optimizer but does not specify any software libraries or frameworks with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | For training, we employ an SGD optimizer with a learning rate of 0.01, momentum of 0.9, and weight decay of 0.0005. The expert models θexpert are trained for 60 epochs on low-resolution datasets and Tiny Image Net, and for 80 epochs on Image Net-1K subsets. We consistently use 5 expert models for all datasets as the default setting. The number of iterations for Distribution Calibration is fixed at 1 across all datasets. During the condensing process, the SGD optimizer is set with a learning rate of 0.1 for Image Net-1K subsets and 0.01 for other datasets, with the learning rate being scaled by the number of images per class (IPC). Following IDC [Kim et al., 2022], we train the networks using a sequence of color transformation, cropping, and Cut Mix [Yun et al., 2019]. The factor parameter l is set to 2 for low-resolution datasets and Tiny-Image Net, and 3 for Image Net-1K subsets. All synthetic data are initially generated from randomly selected real data to expedite optimization. |