CSPG: Crossing Sparse Proximity Graphs for Approximate Nearest Neighbor Search
Authors: Ming Yang, Yuzheng Cai, Weiguo Zheng
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In addition, we conduct extensive experiments on benchmark datasets. The experimental results confirm that the existing graph-based methods can be significantly outperformed by incorporating CSPG, achieving 1.5x to 2x speedups of QPS in almost all recalls. |
| Researcher Affiliation | Academia | Ming Yang, Yuzheng Cai, Weiguo Zheng School of Data Science, Fudan University, China { yangm24, yuzhengcai21 } @m.fudan.edu.cn zhengweiguo@fudan.edu.cn |
| Pseudocode | Yes | Algorithm 1 Search on CSPG index; Algorithm 2 CSPG index construction; Algorithm 3 CSPG vector insertion; Algorithm 4 CSPG vector deletion |
| Open Source Code | Yes | All our source codes are available at https://github.com/PUITAR/CSPG. |
| Open Datasets | Yes | As summarized in Table 3, four benchmark datasets are used in our experiments, which are the most commonly used public datasets come from Ann-benchmarks [48]. |
| Dataset Splits | No | The paper uses benchmark datasets like SIFT1M, GIST1M, DEEP1M, SIFT10M, but does not explicitly describe the training, validation, or test splits. It relies on standard benchmarks which typically have predefined splits, but these are not specified within the text. |
| Hardware Specification | Yes | All experiments are conducted on a machine with Intel Xeon Gold 6136 CPU @3.00GHz and 128GB memory. |
| Software Dependencies | No | The paper mentions using specific graph-based ANNS algorithms (HNSW, Vamana, HCNNG) and provides a link to source code, but does not explicitly list software dependencies with specific version numbers in the text. |
| Experiment Setup | Yes | The detailed index construction parameters are listed as follows... HNSW. The degree upper bound M = 32, and the ef Construction = 128. Vamana. The degree upper bound R = 32, beam size L = 128, and pruning parameter α = 1.2. HCNNG. The number of cluster trees T = 10, the leaf size of MST Ls = 1000, and the MST degree s = 5. CSPG. For fairness, for each graph-based algorithm used in CSPG, we slightly adjust its parameters to ensure the average degree of the built graphs is the same as the original algorithm. The parameters ef for each baseline algorithm and the ef2 for CSPG method are uniformly picked from [10, 300] to obtain QPS at different Recall. For CSPG method, we set parameter ef1 = 1 and sampling ratio λ = 0.5 by default |