Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Discovering Data Structures: Nearest Neighbor Search and Beyond

Authors: Omar Salemohamed, Laurent Charlin, Shivam Garg, Vatsal Sharan, Gregory Valiant

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our end-to-end model (referred to as E2E) on 1-dimensional, 2-dimensional, and high-dimensional NN search. We primarily focus on data structures that do not use extra space, but we also explore scenarios with additional space in App B.12. We compare against suitable NN data structures in each setting (e.g., sorting followed by binary search in 1D), and also against several ablations to study the impact of various model components. Our findings are: Learning Classical Algorithms (Sections 2.2, 2.3) For 1D nearest neighbor search, the data-processing network learns to sort, while the query network simultaneously learns to search over the sorted data. ... Useful representations in high dimensions (Section 2.4) ... Beyond NN search (Section 3) ...
Researcher Affiliation	Collaboration	Omar Salemohamed Université de Montréal, Mila Laurent Charlin HEC Montréal, Mila Shivam Garg Microsoft Research Vatsal Sharan University of Southern California Gregory Valiant Stanford University
Pseudocode	No	The paper describes the model architecture and training process in detail across sections 1.1, 2, and Appendices B and C, but does not present the steps in a formal pseudocode or algorithm block.
Open Source Code	Yes	Our code is available at: https://github.com/omar-s1/data_structure_discovery
Open Datasets	Yes	We include additional high-dimensional (100D) experiments on two standard approximate NN benchmarks, Fashion MNIST and SIFT [10], to demonstrate the model s performance on realistic data (see App B.10 for more details). ... When trained on an extended 3-digit MNIST dataset, the model identifies features that capture number order, sorts images accordingly, and searches within the sorted set all learned jointly from scratch!
Dataset Splits	Yes	For experiments with fixed training sets (e.g. 3-Digit MNIST/Fashion MNIST/SIFT), we resampled from their corresponding training sets (sizes 60k/60k/1M, respectively) and used their test sets for evaluation. Given that the models perform well on test data for these distributions, it suggests that we do not need as much unique data as we use in the online setting. For each dataset, we use the train split to train our model (as well as the learning-to-hash baselines) and for evaluation we use the test split.
Hardware Specification	Yes	All experiments are run on a single NVIDIA RTX8000 GPU.
Software Dependencies	No	The paper mentions the use of 'Adam optimizer [48] with default Py Torch [49] settings' but does not specify version numbers for PyTorch or other key software dependencies.
Experiment Setup	Yes	All models are trained for at most 2 million gradient steps with early-stopping using a batch size of 1024. In all experiments we use a batch size of 1024, 1e-3 weight decay and the Adam optimizer [48] with default Py Torch [49] settings. We do a grid search over {0.0001, 0.00001, 0.00005} to find the best learning rate for both models. Near the end of training when performance plateaus, we decrease learning rates to boost performance. We apply the Gumbel Softmax [50] with a temperature of 2 to the lookup vectors to encourage sparsity.