Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

HM-ANN: Efficient Billion-Point Nearest Neighbor Search on Heterogeneous Memory

Authors: Jie Ren, Minjia Zhang, Dong Li

NeurIPS 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive evaluation and show that on two billion-scale datasets, HM-ANN provides 95% top-1 recall in less than one millisecond; HM-ANN outperforms state-of-the-art compression-based solutions such as L&C [13] and IMI+OPQ [12] in terms of recall-vs-latency by a large margin, getting 46% higher recall under the same search latency budget;
Researcher Affiliation Collaboration Jie Ren University of California, Merced EMAIL Minjia Zhang Microsoft Research EMAIL Dong Li University of California, Merced EMAIL
Pseudocode Yes Algorithm 1: HM-ANN Graph Construction Algorithm. Input: vector set V ,vector dimension d, number of established connection M, size of dynamic candidate list ef Construction Output: Multi-layer graph HM-ANN Parameters :# of nodes in layer i Ni, HM-ANN layer depth l
Open Source Code No The paper refers to open-source tools used for comparison (e.g., FAISS [1] and NSG [2] with GitHub links), but does not state that the code for HM-ANN itself is open-source or provide a link to it.
Open Datasets Yes We use five datasets, BIGANN [22], DEEP1B [6], SIFT1M [22], DEEP1M [6], and GIST1M [4]. BIGANN contains one billion of 128-dimensional SIFT descriptors as a base set and 10,000 query vectors. DEEP1B contains one billion of 96-dimensional feature vectors of natural images and 10,000 queries. SIFT1M and DEEP1M are one-million subset vectors in BIGANN and DEEP1B respectively. GIST1M contains one-million 960-dimensional image descriptors.
Dataset Splits No The paper mentions '10,000 query vectors' for BIGANN and DEEP1B, which serve as test queries, but does not provide explicit train/validation/test dataset splits or their sizes for the main datasets (BIGANN, DEEP1B, SIFT1M, DEEP1M, GIST1M).
Hardware Specification Yes All experiments are done on a machine with Intel Xeon Gold 6252 CPU@2.3GHz. It uses DDR4 (96GB) as fast memory and Optane DC PMM (1.5TB) as slow memory.
Software Dependencies No The paper mentions 'Faiss [1]' but does not provide specific version numbers for any software components or libraries used.
Experiment Setup Yes For HNSW, we build graphs with ef Construction and M set to 200 and 48 respectively; For NSG we first build a 100-NN graph using Faiss [1] and then build NSG graphs with R = 128, L = 70 and C = 500. We collect results on NSG and HNSW using Memory Mode, since it leads to overall better performance than using first-touch NUMA (see Section 4.3 for the comparison of the two). For IMI+OPQ, we build indexes with 64and 80-byte code-books on BIGANN and DEEP1B respectively. We present the best search result with search parameters nprobe=128 and ht=30 for BIGANN and with autotuning parameter sweep on DEEP1B. For L&C, we use 6 as the number of links on the base level, and use 36and 144-byte OPQ code-books. We use the same parameters (ef Construction=200 and M=48) as HNSW to construct HM-ANN. We set ef Search L0=2 and vary ef Search L1 to show the latency-vs-recall trade-offs.