Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Dataset Distillation for Pre-Trained Self-Supervised Vision Models

Authors: George Cazenavette, Antonio Torralba, Vincent Sitzmann

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments and ablations validate our Linear Gradient Matching method s effectiveness on this new dataset distillation task and highlight its potential as an interpretability tool.
Researcher Affiliation	Academia	George Cazenavette Antonio Torralba Vincent Sitzmann Massachusetts Institute of Technology
Pseudocode	No	The paper describes the Linear Gradient Matching method using formal equations and detailed textual descriptions in Section 3 and Appendix A.1, but it does not present any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code and distilled datasets can be found on our project page. georgecazenavette.github.io/linear-gm
Open Datasets	Yes	We evaluate our method on various datasets, including Image Net-1k [12] and Image Net-100 [42] for our primary results, Spawrious [26] and Waterbirds [38] for a study on adversarial datasets, Stanford Dogs [21] and CUB-200-2011 [45] for fine-grained visual classification, and Art Bench [24] to test the method s out-of-distribution capabilities.
Dataset Splits	Yes	To measure our method s performance on a given feature extractor, we randomly initialize a linear classifier and optimize it to convergence using the distilled images before evaluating on the test set. The same procedure is used to evaluate real-image baselines. ... For the training set, we perform the same set of augmentations as during distillation (horizontal flip, random resized crop, and Gaussian noise). The output of the random resized crop is of size 224 224. For the test set, we resize the shortest side to 256 and then do a center crop of 224 224.
Hardware Specification	Yes	We used a variety of GPUs for this work depending on what was available on the shared cluster. Specifically, we used a combination of H200, A100, L40s, Ada6000, and 4090 GPUs.
Software Dependencies	Yes	We implement our method in Pytorch [2, 33]... We optimize our pyramid representations using Adam [22]... We initially used the Kornia [36] implementations...
Experiment Setup	Yes	In all our experiments, we distill the given dataset for 5000 iterations before training linear probes to convergence on the resulting synthetic images. All experiments are conducted at 224 224 resolution and use the Vi T-B version of the given model. All distilled datasets by default use 10 sets of augmentations per batch except for Image Net-1k, for which only 3 sets of augmentations are used due to compute constraints. ... We optimize our pyramid representations using Adam [22] with a learning rate of 0.002. ... We then train the linear classifier for 1000 epochs with a batch size of 100. We use an Adam optimizer with a learning rate of 0.001/256 ... along with a cosine decay learning rate schedule. We stop training early if the test accuracy has not improved over the last 50 epochs.