Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
AdANNS: A Framework for Adaptive Semantic Search
Authors: Aniket Rege, Aditya Kusupati, Sharan Ranjit S, Alan Fan, Qingqing Cao, Sham Kakade, Prateek Jain, Ali Farhadi
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate state-of-the-art accuracy-compute trade-offs using novel Ad ANNS-based key ANNS building blocks like search data structures (Ad ANNS-IVF) and quantization (Ad ANNS-OPQ). For example on Image Net retrieval, Ad ANNS-IVF is up to 1.5% more accurate than the rigid representations-based IVF [48] at the same compute budget; and matches accuracy while being up to 90 faster in wall-clock time. |
| Researcher Affiliation | Collaboration | University of Washington, Google Research, Harvard University EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 Ad ANNS-IVF Psuedocode |
| Open Source Code | Yes | Code is open-sourced at https://github.com/RAIVNLab/Ad ANNS. |
| Open Datasets | Yes | We experiment with two public datasets: (a) Image Net-1K [45] dataset on the task of image retrieval where the goal is to retrieve images from a database (1.3M image train set) belonging to the same class as the query image (50K image validation set) and (b) Natural Questions (NQ) [32] dataset on the task of question answering through dense passage retrieval where the goal is to retrieve the relevant passage from a database (21M Wikipedia passages) for a query (3.6K questions). |
| Dataset Splits | Yes | Image Net-1K [45] dataset... (50K image validation set) and Natural Questions (NQ) [32] dataset... The training set contains 79,168 question and answer pairs, the dev set has 8,757 pairs and the test set has 3,610 pairs. |
| Hardware Specification | Yes | All ANNS experiments (...) were run on an Intel Xeon 2.20GHz CPU with 12 cores. Exact Search (...) and Disk ANN experiments were run with CUDA 11.0 on a A100-SXM4 NVIDIA GPU with 40G RAM. |
| Software Dependencies | Yes | Exact Search (Flat L2, PQ, OPQ) and Disk ANN experiments were run with CUDA 11.0 on a A100-SXM4 NVIDIA GPU with 40G RAM. |
| Experiment Setup | Yes | The default setting in this work, unless otherwise stated, is np = 1, k = 1024, ND = 1281167 (Image Net-1K trainset). Ad ANNS-IVF is evaluated for all possible tuples of dc, ds, k = |C| {8, 16, . . . , 2048}. We experimented with 8 128 byte OPQ budgets. |