reproducibilityindex.ai

Sparse Parameterization for Epitomic Dataset Distillation

Authors: Xing Wei, Anjia Cao, Funing Yang, Zhiheng Ma

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate the superiority of SPEED in handling high-resolution datasets, achieving state-of-the-art performance on multiple benchmarks and downstream applications. Our framework is compatible with a variety of dataset matching approaches, generally enhancing their performance.
Researcher Affiliation	Academia	1School of Software Engineering, Xi an Jiaotong University 2Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences
Pseudocode	Yes	Algorithm 1 Sparse Parameterization for Epitomic Dataset Distillation (SPEED). Input: T : original dataset; N: total number of synthetic images; H: number of heads; k: expected number of non-zero elements of SCM; SPARSIFY( , ): feature sparsification.
Open Source Code	Yes	Source code is available at https://github.com/MIV-XJTU/SPEED.
Open Datasets	Yes	We evaluate our methods on the following datasets: i) CIFAR10 [48]: A standard image dataset consists of 60,000 32 32 RGB images in 10 different classes, including airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck. For each class, 5000 images are used for training and 1000 images are used for testing. ii) CIFAR100 [48]: CIFAR100 contains 100 classes. It has a training set with 50,000 images and a testing set with 10,000 images. iii) Tiny Image Net [49]: A 64 64 image dataset with 200 classes. Each class has 500 images for training and 50 images for testing. iv) Image Net [50] subsets: High resolution (128 128) datasets from ILSVRC2012 [50].
Dataset Splits	No	The paper provides details for training and testing splits for CIFAR10, CIFAR100, and Tiny Image Net (e.g., '5000 images are used for training and 1000 images are used for testing' for CIFAR10), but it does not explicitly mention a separate validation split or how it was used to reproduce the experiments.
Hardware Specification	Yes	Our experiments were run on a mixture of RTX 3090, RTX A6000, and A800 GPUs.
Software Dependencies	No	The paper mentions using 'Kornia' for ZCA whitening and 'SGD optimizer', but it does not specify version numbers for these or other key software components, such as PyTorch or Python.
Experiment Setup	Yes	To quantify the performance and guarantee the fairness of the comparison, we use the default Conv-Instance Norm-Re LU-Avg Pool Conv Net with 128 channels as our training backbone, consistent with previous methods. We adopt trajectory matching [16] as our default matching objective. ... we employ a 3-layer Conv Net and a 4-layer Conv Net for training and evaluating CIFAR and Tiny Image Net, respectively. ... For CIFAR10 (32 32), CIFAR100 (32 32), and Tiny Image Net (64 64), we fix the default number of patches J to 64, and increase it to 256 for Image Net subsets (128 128). ... Following the mainstream evaluation settings, we train 5 randomly initialized evaluation networks on the synthetic dataset, using SGD optimizer, where the momentum and weight decay are set to 0.9 and 0.0005, respectively.