Multisize Dataset Condensation

Authors: Yang He, Lingao Xiao, Joey Tianyi Zhou, Ivor Tsang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments validate our findings on networks including Conv Net, Res Net and Dense Net, and datasets including SVHN, CIFAR-10, CIFAR-100 and Image Net.
Researcher Affiliation Academia 1CFAR, Agency for Science, Technology and Research, Singapore 2IHPC, Agency for Science, Technology and Research, Singapore 3School of Computer Science and Engineering, Nanyang Technological University
Pseudocode Yes Algo. 1 provides the algorithm of the proposed MDC method. Algorithm 1 Multisize Dataset Condensation
Open Source Code Yes Code is available at: https://github.com/he-y/Multisize-Dataset-Condensation.
Open Datasets Yes SVHN (Netzer et al., 2011) contains street digits of shape 32 32 3. The dataset contains 10 classes including digits from 0 to 9. The training set has 73257 images, and the test set has 26032 images. CIFAR-10 (Krizhevsky et al., 2009) contains images of shape 32 32 3 and has 10 classes in total... The training set has 5,000 images per class and the test set has 1,000 images per class, containing in total 50,000 training images and 10,000 testing images. CIFAR-100 (Krizhevsky et al., 2009) contains images of shape 32 32 3 and has 100 classes in total. Each class contains 500 images for training and 100 images for testing, leading to a total of 50,000 training images and 10,000 testing images. Image Net-10 (Deng et al., 2009) is a subset of Image Net-1K (Deng et al., 2009) containing images with an average 469 387 3 pixels but reshaped to resolution of 224 224 3. It contains 1,280 training images per class on average and a total of 50,000 images for testing (validation set).
Dataset Splits Yes The training set has 73257 images, and the test set has 26032 images. (SVHN) The training set has 5,000 images per class and the test set has 1,000 images per class, containing in total 50,000 training images and 10,000 testing images. (CIFAR-10) Each class contains 500 images for training and 100 images for testing, leading to a total of 50,000 training images and 10,000 testing images. (CIFAR-100) It contains 1,280 training images per class on average and a total of 50,000 images for testing (validation set). (Image Net-10)
Hardware Specification No The paper does not specify any particular hardware used for the experiments, such as GPU models, CPU types, or cloud computing instances with their specifications.
Software Dependencies No The paper mentions using IDC (Kim et al., 2022b) as a basic condensation method and the SGD optimizer, but it does not provide specific version numbers for any software libraries, frameworks, or programming languages (e.g., PyTorch version, Python version, CUDA version).
Experiment Setup Yes For CIFAR-10, CIFAR-100 and SVHN, we use a batch size of 128 for IPC 30, and a batch size of 256 for IPC > 30. For Image Net-10 IPC20, we use a batch size of 256. The network is randomly initialized 2000 times for CIFAR-10, CIFAR-100 and SVHN, and 500 times for Image Net-10; for each initialization, the network is trained for 100 epochs. For both Conv Net-D3 and Res Net10-AP, the learning rate is 0.01 with 0.9 momentum and 0.0005 weight decay. The SGD optimizer and a multi-step learning rate scheduler are used. The network is trained for 1000 epochs. The last layer feature is used for the feature distance calculation. The computed feature distance is averaged across 100 inner loop training epochs for a specific outer loop. For CIFAR-10, CIFAR-100 and SVHN, the feature distance is calculated at intervals of every t = 100 outer loop. For Image Net-10, t = 50. We follow Eq. 10 for MLS freezing. We perform augmentation during training networks in condensation and evaluation, and we use coloring, cropping, flipping, scaling, rotating, and mixup. When updating network parameters, image augmentations are different for each image in a batch; when updating synthetic images, the same augmentations are utilized for the synthetic images and corresponding real images in a batch. For all results we use IDC (Kim et al., 2022b) as the basic condensation method otherwise stated. Following its setup, we use a multi-formation factor of 2 for SVHN, CIFAR-10, CIFAR-100 datasets and a factor of 3 for Image Net-10.