Sequential Subset Matching for Dataset Distillation

Authors: JIAWEI DU, Qin Shi, Joey Tianyi Zhou

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our proposed Seq Match outperforms state-of-the-art methods in various datasets, including SVNH, CIFAR-10, CIFAR-100, and Tiny Image Net. Our code is available at https://github.com/shqii1j/seqmatch. Experiments on diverse datasets demonstrate the effectiveness of Seq Match, achieving state-of-the-art performance.
Researcher Affiliation Academia Jiawei Du , Qin Shi , Joey Tianyi Zhou Centre for Frontier AI Research (CFAR), Agency for Science, Technology and Research (A*STAR), Singapore Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), Singapore {dujw,Joey_Zhou}@cfar.a-star.edu.sg, shiqin924924@gmail.com
Pseudocode Yes Algorithm 1 Training with Seq Match in Distillation Phase.
Open Source Code Yes Our code is available at https://github.com/shqii1j/seqmatch.
Open Datasets Yes Datasets: We evaluate the performance of dataset distillation methods on several widely-used datasets across various resolutions. MNIST [28]... SVNH [36]... CIFAR10 and CIFAR100 [25]... Tiny Image Net [27]... Image Net [24] subsets...
Dataset Splits No The paper states 'The optimal value of hyperparameter K is obtained via grid searches within the set {2, 3, 4, 5, 6} in a validation set within the CIFAR-10 dataset.' This confirms the use of a validation set but does not provide specific details on how this split was created (e.g., percentages or sample counts for training, validation, and test sets).
Hardware Specification Yes We conduct our experiments on the server with four Tesla V100 GPUs.
Software Dependencies No The paper mentions using Conv Net and Res Net, but does not specify software dependencies like Python, PyTorch/TensorFlow, or CUDA versions.
Experiment Setup Yes To ensure the reproducibility of Seq Match, we provide detailed implementation specifications. Our method relies on a single hyperparameter, denoted by K, which determines the number of subsets. In order to balance the inclusion of sufficient knowledge in each segment with the capture of high-level features in the later stages, we set K = {2, 3} for the scenarios where ipc = {10, 50}, respectively... Table 4: Hyperparameter values we used for Seq Match-MTT in the main result table. Most of the hyperparameters Max Start Epoch and Synthetic Step are various with the subsets, we use a sequential numbers to denote the parameters used in the corresponding subsets. Img. denotes the abbreviation of Image Net.