Probabilistic Routing for Graph-Based Approximate Nearest Neighbor Search

Authors: Kejing Lu, Chuan Xiao, Yoshiharu Ishikawa

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments demonstrate that equipping PEOs can increase throughput on commonly utilized graph indexes (HNSW and NSSG) by a factor of 1.6 to 2.5, and its efficiency consistently outperforms the leading-edge routing technique by 1.1 to 1.4 times. The code and datasets used for our evaluations are publicly accessible at https: //github.com/ICML2024-code/PEOs .
Researcher Affiliation Academia Kejing Lu 1 Chuan Xiao 1 2 Yoshiharu Ishikawa 1 Graduate School of Informatics, Nagoya University, Japan 2Graduate School of Information Science and Technology, Osaka University, Japan. Correspondence to: Kejing Lu <lu@db.is.i.nagoya-u.ac.jp>.
Pseudocode Yes Algorithm 1: Graph-based ANNS with routing and Algorithm 2: PEOs Test
Open Source Code Yes The code and datasets used for our evaluations are publicly accessible at https: //github.com/ICML2024-code/PEOs .
Open Datasets Yes The code and datasets used for our evaluations are publicly accessible at https: //github.com/ICML2024-code/PEOs . We use six million-scale datasets and a 100M dataset for scalability test. The statistics can be found in Table 1.
Dataset Splits No The paper does not explicitly provide training/validation/test dataset splits, specific percentages, or k-fold cross-validation setups for the datasets used in experiments. It mentions parameters like 'efc' for search configuration but not for data partitioning.
Hardware Specification Yes All the experiments were performed on a PC with Intel(R) Xeon(R) Gold 6258R CPU @ 2.70GHz.
Software Dependencies No All the compared methods were implemented in C++ but no specific version numbers for compilers or libraries are provided.
Experiment Setup Yes We use the following parameter setup for the competitors: (1) HNSW. M = 32, efc = 2000 for the two Glo Ve datasets and efc = 1000 for the other datasets. ... (6) PEOs (+HNSW). Based on the analysis in Sec. 6.2, L is set to 8, 8, 10, 15, 16, 20 on the six datasets sorted by ascending order of dimension. ϵ is set to 0.2 and m = 128 such that every vector ID can be encoded by one byte.