Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

GIST: Greedy Independent Set Thresholding for Max-Min Diversification with Submodular Utility

Authors: Matthew Fahrbach, Srikumar Ramalingam, Morteza Zadimoghaddam, Sara Ahmadian, Gui Citovsky, Giulia DeSalvo

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we show in our empirical study that GIST outperforms state-of-the-art benchmarks for a single-shot data sampling task on Image Net.
Researcher Affiliation	Industry	Matthew Fahrbach Google EMAIL Srikumar Ramalingam Google EMAIL Morteza Zadimoghaddam Google EMAIL Sara Ahmadian Google EMAIL Gui Citovsky Google EMAIL Giulia De Salvo Google EMAIL
Pseudocode	Yes	Algorithm 1 Max-min diversification with submodular utility via greedy weighted independent sets.
Open Source Code	No	Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: The experimental setup and hyperparameters are provided in Section 5 and Appendix E.
Open Datasets	Yes	Our Image Net data sampling experiment compares the top-1 image classification accuracy achieved by different single-shot subset selection algorithms. Setup. We use the standard vision dataset Image Net [54] containing ~1.3 million images and 1000 classes.
Dataset Splits	Yes	We select 10% of the images uniformly at random and use them to train an initial Res Net-56 model θ0 [29].
Hardware Specification	No	The end-to-end running time is dominated by Image Net model training, which takes more than a few hours even with several accelerators (e.g., GPU/TPU chips).
Software Dependencies	No	We use SGD with Nesterov momentum 0.9 with 450/90 epochs. The base learning rate is 0.1, and is reduced by a tenth at 5, 30, 69, and 80. We extract the penultimate layer features to produce 2048-dimensional embeddings of each image.
Experiment Setup	Yes	We use SGD with Nesterov momentum 0.9 with 450/90 epochs. The base learning rate is 0.1, and is reduced by a tenth at 5, 30, 69, and 80. We extract the penultimate layer features to produce 2048-dimensional embeddings of each image. We use the same hyperparameters as the original Res Net paper [29] with budgets and one-shot subset selection experiments designed in the same manner as [49].