Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Probabilistic Routing for Graph-Based Approximate Nearest Neighbor Search
Authors: Kejing Lu, Chuan Xiao, Yoshiharu Ishikawa
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate that equipping PEOs can increase throughput on commonly utilized graph indexes (HNSW and NSSG) by a factor of 1.6 to 2.5, and its efficiency consistently outperforms the leading-edge routing technique by 1.1 to 1.4 times. The code and datasets used for our evaluations are publicly accessible at https: //github.com/ICML2024-code/PEOs . |
| Researcher Affiliation | Academia | Kejing Lu 1 Chuan Xiao 1 2 Yoshiharu Ishikawa 1 Graduate School of Informatics, Nagoya University, Japan 2Graduate School of Information Science and Technology, Osaka University, Japan. Correspondence to: Kejing Lu <EMAIL>. |
| Pseudocode | Yes | Algorithm 1: Graph-based ANNS with routing and Algorithm 2: PEOs Test |
| Open Source Code | Yes | The code and datasets used for our evaluations are publicly accessible at https: //github.com/ICML2024-code/PEOs . |
| Open Datasets | Yes | The code and datasets used for our evaluations are publicly accessible at https: //github.com/ICML2024-code/PEOs . We use six million-scale datasets and a 100M dataset for scalability test. The statistics can be found in Table 1. |
| Dataset Splits | No | The paper does not explicitly provide training/validation/test dataset splits, specific percentages, or k-fold cross-validation setups for the datasets used in experiments. It mentions parameters like 'efc' for search configuration but not for data partitioning. |
| Hardware Specification | Yes | All the experiments were performed on a PC with Intel(R) Xeon(R) Gold 6258R CPU @ 2.70GHz. |
| Software Dependencies | No | All the compared methods were implemented in C++ but no specific version numbers for compilers or libraries are provided. |
| Experiment Setup | Yes | We use the following parameter setup for the competitors: (1) HNSW. M = 32, efc = 2000 for the two Glo Ve datasets and efc = 1000 for the other datasets. ... (6) PEOs (+HNSW). Based on the analysis in Sec. 6.2, L is set to 8, 8, 10, 15, 16, 20 on the six datasets sorted by ascending order of dimension. ϵ is set to 0.2 and m = 128 such that every vector ID can be encoded by one byte. |