Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
GIST: Greedy Independent Set Thresholding for Max-Min Diversification with Submodular Utility
Authors: Matthew Fahrbach, Srikumar Ramalingam, Morteza Zadimoghaddam, Sara Ahmadian, Gui Citovsky, Giulia DeSalvo
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we show in our empirical study that GIST outperforms state-of-the-art benchmarks for a single-shot data sampling task on Image Net. |
| Researcher Affiliation | Industry | Matthew Fahrbach Google EMAIL Srikumar Ramalingam Google EMAIL Morteza Zadimoghaddam Google EMAIL Sara Ahmadian Google EMAIL Gui Citovsky Google EMAIL Giulia De Salvo Google EMAIL |
| Pseudocode | Yes | Algorithm 1 Max-min diversification with submodular utility via greedy weighted independent sets. |
| Open Source Code | No | Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: The experimental setup and hyperparameters are provided in Section 5 and Appendix E. |
| Open Datasets | Yes | Our Image Net data sampling experiment compares the top-1 image classification accuracy achieved by different single-shot subset selection algorithms. Setup. We use the standard vision dataset Image Net [54] containing ~1.3 million images and 1000 classes. |
| Dataset Splits | Yes | We select 10% of the images uniformly at random and use them to train an initial Res Net-56 model θ0 [29]. |
| Hardware Specification | No | The end-to-end running time is dominated by Image Net model training, which takes more than a few hours even with several accelerators (e.g., GPU/TPU chips). |
| Software Dependencies | No | We use SGD with Nesterov momentum 0.9 with 450/90 epochs. The base learning rate is 0.1, and is reduced by a tenth at 5, 30, 69, and 80. We extract the penultimate layer features to produce 2048-dimensional embeddings of each image. |
| Experiment Setup | Yes | We use SGD with Nesterov momentum 0.9 with 450/90 epochs. The base learning rate is 0.1, and is reduced by a tenth at 5, 30, 69, and 80. We extract the penultimate layer features to produce 2048-dimensional embeddings of each image. We use the same hyperparameters as the original Res Net paper [29] with budgets and one-shot subset selection experiments designed in the same manner as [49]. |