Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Probabilistic Routing for Graph-Based Approximate Nearest Neighbor Search

Authors: Kejing Lu, Chuan Xiao, Yoshiharu Ishikawa

ICML 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments demonstrate that equipping PEOs can increase throughput on commonly utilized graph indexes (HNSW and NSSG) by a factor of 1.6 to 2.5, and its efficiency consistently outperforms the leading-edge routing technique by 1.1 to 1.4 times. The code and datasets used for our evaluations are publicly accessible at https: //github.com/ICML2024-code/PEOs .
Researcher Affiliation Academia Kejing Lu 1 Chuan Xiao 1 2 Yoshiharu Ishikawa 1 Graduate School of Informatics, Nagoya University, Japan 2Graduate School of Information Science and Technology, Osaka University, Japan. Correspondence to: Kejing Lu <EMAIL>.
Pseudocode Yes Algorithm 1: Graph-based ANNS with routing and Algorithm 2: PEOs Test
Open Source Code Yes The code and datasets used for our evaluations are publicly accessible at https: //github.com/ICML2024-code/PEOs .
Open Datasets Yes The code and datasets used for our evaluations are publicly accessible at https: //github.com/ICML2024-code/PEOs . We use six million-scale datasets and a 100M dataset for scalability test. The statistics can be found in Table 1.
Dataset Splits No The paper does not explicitly provide training/validation/test dataset splits, specific percentages, or k-fold cross-validation setups for the datasets used in experiments. It mentions parameters like 'efc' for search configuration but not for data partitioning.
Hardware Specification Yes All the experiments were performed on a PC with Intel(R) Xeon(R) Gold 6258R CPU @ 2.70GHz.
Software Dependencies No All the compared methods were implemented in C++ but no specific version numbers for compilers or libraries are provided.
Experiment Setup Yes We use the following parameter setup for the competitors: (1) HNSW. M = 32, efc = 2000 for the two Glo Ve datasets and efc = 1000 for the other datasets. ... (6) PEOs (+HNSW). Based on the analysis in Sec. 6.2, L is set to 8, 8, 10, 15, 16, 20 on the six datasets sorted by ascending order of dimension. ϵ is set to 0.2 and m = 128 such that every vector ID can be encoded by one byte.