Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Optimistic Query Routing in Clustering-based Approximate Maximum Inner Product Search

Authors: Sebastian Bruch, Aditya Krishnan, Franco Maria Nardini

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We test our proposal in Section 3 on a variety of ANN benchmark datasets. As our experiments show, our optimistic router achieves the same recall as state-of-the-art routers but with up to a 50% reduction in the total volume of points evaluated per query.
Researcher Affiliation	Collaboration	Sebastian Bruch Northeastern University Boston, MA, USA EMAIL Aditya Krishnan Microsoft New York, NY, USA EMAIL Franco Maria Nardini ISTI-CNR Pisa, Italy EMAIL
Pseudocode	Yes	Algorithm 1 Indexing and scoring a single partition with OPTIMIST
Open Source Code	Yes	Code: We have implemented all baseline and proposed routers in the Rust programming language. We have open-sourced5 our code along with experimental configuration to facilitate reproducibility. 5Available at https://github.com/Artificial-Memory-Lab/optimist-router
Open Datasets	Yes	TEXT2IMAGE: ... The dataset is available under the terms of the Creative Commons Attribution 4.0 International license. MUSIC: ... To the best of our knowledge, the dataset comes with no information on the license under which it is made available (per https://github.com/stanis-morozov/ip-nsw?tab=readme-ov-file). DEEPIMAGE: ... The dataset is available under the terms of Apache license 2.0. GLOVE: ... The dataset is available under the terms of the Public Domain Dedication and license v1.0. MSMARCO-MINILM: ... The dataset is available under the terms of the Creative Commons Attribution 4.0 International license. NQ-ADA2: ... The dataset is available under the terms of Apache license 2.0.
Dataset Splits	Yes	As for train-test splits, the benchmark datasets used in our work come with a test query set that is separate from the data points.
Hardware Specification	Yes	Latency: ... We run experiments on AWS c5.xlarge (4 v CPUs, 8GB memory). Baseline bandwidth of the SSD attached to this machine is 1,150Mbps. Experiments compute resources: ... For experiments that study the latency of different methods, we state the specifications of the hardware used.
Software Dependencies	No	Code: We have implemented all baseline and proposed routers in the Rust programming language. We have open-sourced5 our code along with experimental configuration to facilitate reproducibility. The paper mentions the programming language used but does not provide specific version numbers for any software or libraries.
Experiment Setup	Yes	OPTIMIST (t, δ): t and δ are parameters of Algorithm 1. By default, we set δ = 0.8 and t to values in Table 1, but study their effect in Appendix E. If unspecified, it should be understood that default parameters are used. SCANN (T): Similar to MEAN and NORMALIZEDMEAN, but where routing is determined by inner product between a query and the SCANN centroids (c.f., Theorem 4.2 in [Guo et al., 2020]). SCANN has a single hyperparameter T, which we set to 0.5 after tuning. Clustering: For our main results, we partition the datasets with spherical KMeans [Dhillon and Modha, 2001]. ... We cluster each dataset into C = m shards, where m is the number of data points in the dataset.