reproducibilityindex.ai

CSPG: Crossing Sparse Proximity Graphs for Approximate Nearest Neighbor Search

Authors: Ming Yang, Yuzheng Cai, Weiguo Zheng

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In addition, we conduct extensive experiments on benchmark datasets. The experimental results confirm that the existing graph-based methods can be significantly outperformed by incorporating CSPG, achieving 1.5x to 2x speedups of QPS in almost all recalls.
Researcher Affiliation	Academia	Ming Yang, Yuzheng Cai, Weiguo Zheng School of Data Science, Fudan University, China { yangm24, yuzhengcai21 } @m.fudan.edu.cn zhengweiguo@fudan.edu.cn
Pseudocode	Yes	Algorithm 1 Search on CSPG index; Algorithm 2 CSPG index construction; Algorithm 3 CSPG vector insertion; Algorithm 4 CSPG vector deletion
Open Source Code	Yes	All our source codes are available at https://github.com/PUITAR/CSPG.
Open Datasets	Yes	As summarized in Table 3, four benchmark datasets are used in our experiments, which are the most commonly used public datasets come from Ann-benchmarks [48].
Dataset Splits	No	The paper uses benchmark datasets like SIFT1M, GIST1M, DEEP1M, SIFT10M, but does not explicitly describe the training, validation, or test splits. It relies on standard benchmarks which typically have predefined splits, but these are not specified within the text.
Hardware Specification	Yes	All experiments are conducted on a machine with Intel Xeon Gold 6136 CPU @3.00GHz and 128GB memory.
Software Dependencies	No	The paper mentions using specific graph-based ANNS algorithms (HNSW, Vamana, HCNNG) and provides a link to source code, but does not explicitly list software dependencies with specific version numbers in the text.
Experiment Setup	Yes	The detailed index construction parameters are listed as follows... HNSW. The degree upper bound M = 32, and the ef Construction = 128. Vamana. The degree upper bound R = 32, beam size L = 128, and pruning parameter α = 1.2. HCNNG. The number of cluster trees T = 10, the leaf size of MST Ls = 1000, and the MST degree s = 5. CSPG. For fairness, for each graph-based algorithm used in CSPG, we slightly adjust its parameters to ensure the average degree of the built graphs is the same as the original algorithm. The parameters ef for each baseline algorithm and the ef2 for CSPG method are uniformly picked from [10, 300] to obtain QPS at different Recall. For CSPG method, we set parameter ef1 = 1 and sampling ratio λ = 0.5 by default