Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Sequential Subset Matching for Dataset Distillation
Authors: JIAWEI DU, Qin Shi, Joey Tianyi Zhou
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our proposed Seq Match outperforms state-of-the-art methods in various datasets, including SVNH, CIFAR-10, CIFAR-100, and Tiny Image Net. Our code is available at https://github.com/shqii1j/seqmatch. Experiments on diverse datasets demonstrate the effectiveness of Seq Match, achieving state-of-the-art performance. |
| Researcher Affiliation | Academia | Jiawei Du , Qin Shi , Joey Tianyi Zhou Centre for Frontier AI Research (CFAR), Agency for Science, Technology and Research (A*STAR), Singapore Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), Singapore EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 Training with Seq Match in Distillation Phase. |
| Open Source Code | Yes | Our code is available at https://github.com/shqii1j/seqmatch. |
| Open Datasets | Yes | Datasets: We evaluate the performance of dataset distillation methods on several widely-used datasets across various resolutions. MNIST [28]... SVNH [36]... CIFAR10 and CIFAR100 [25]... Tiny Image Net [27]... Image Net [24] subsets... |
| Dataset Splits | No | The paper states 'The optimal value of hyperparameter K is obtained via grid searches within the set {2, 3, 4, 5, 6} in a validation set within the CIFAR-10 dataset.' This confirms the use of a validation set but does not provide specific details on how this split was created (e.g., percentages or sample counts for training, validation, and test sets). |
| Hardware Specification | Yes | We conduct our experiments on the server with four Tesla V100 GPUs. |
| Software Dependencies | No | The paper mentions using Conv Net and Res Net, but does not specify software dependencies like Python, PyTorch/TensorFlow, or CUDA versions. |
| Experiment Setup | Yes | To ensure the reproducibility of Seq Match, we provide detailed implementation specifications. Our method relies on a single hyperparameter, denoted by K, which determines the number of subsets. In order to balance the inclusion of sufficient knowledge in each segment with the capture of high-level features in the later stages, we set K = {2, 3} for the scenarios where ipc = {10, 50}, respectively... Table 4: Hyperparameter values we used for Seq Match-MTT in the main result table. Most of the hyperparameters Max Start Epoch and Synthetic Step are various with the subsets, we use a sequential numbers to denote the parameters used in the corresponding subsets. Img. denotes the abbreviation of Image Net. |