Probabilistic Routing for Graph-Based Approximate Nearest Neighbor Search
Authors: Kejing Lu, Chuan Xiao, Yoshiharu Ishikawa
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate that equipping PEOs can increase throughput on commonly utilized graph indexes (HNSW and NSSG) by a factor of 1.6 to 2.5, and its efficiency consistently outperforms the leading-edge routing technique by 1.1 to 1.4 times. The code and datasets used for our evaluations are publicly accessible at https: //github.com/ICML2024-code/PEOs . |
| Researcher Affiliation | Academia | Kejing Lu 1 Chuan Xiao 1 2 Yoshiharu Ishikawa 1 Graduate School of Informatics, Nagoya University, Japan 2Graduate School of Information Science and Technology, Osaka University, Japan. Correspondence to: Kejing Lu <lu@db.is.i.nagoya-u.ac.jp>. |
| Pseudocode | Yes | Algorithm 1: Graph-based ANNS with routing and Algorithm 2: PEOs Test |
| Open Source Code | Yes | The code and datasets used for our evaluations are publicly accessible at https: //github.com/ICML2024-code/PEOs . |
| Open Datasets | Yes | The code and datasets used for our evaluations are publicly accessible at https: //github.com/ICML2024-code/PEOs . We use six million-scale datasets and a 100M dataset for scalability test. The statistics can be found in Table 1. |
| Dataset Splits | No | The paper does not explicitly provide training/validation/test dataset splits, specific percentages, or k-fold cross-validation setups for the datasets used in experiments. It mentions parameters like 'efc' for search configuration but not for data partitioning. |
| Hardware Specification | Yes | All the experiments were performed on a PC with Intel(R) Xeon(R) Gold 6258R CPU @ 2.70GHz. |
| Software Dependencies | No | All the compared methods were implemented in C++ but no specific version numbers for compilers or libraries are provided. |
| Experiment Setup | Yes | We use the following parameter setup for the competitors: (1) HNSW. M = 32, efc = 2000 for the two Glo Ve datasets and efc = 1000 for the other datasets. ... (6) PEOs (+HNSW). Based on the analysis in Sec. 6.2, L is set to 8, 8, 10, 15, 16, 20 on the six datasets sorted by ascending order of dimension. ϵ is set to 0.2 and m = 128 such that every vector ID can be encoded by one byte. |