AdANNS: A Framework for Adaptive Semantic Search
Authors: Aniket Rege, Aditya Kusupati, Sharan Ranjit S, Alan Fan, Qingqing Cao, Sham Kakade, Prateek Jain, Ali Farhadi
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate state-of-the-art accuracy-compute trade-offs using novel Ad ANNS-based key ANNS building blocks like search data structures (Ad ANNS-IVF) and quantization (Ad ANNS-OPQ). For example on Image Net retrieval, Ad ANNS-IVF is up to 1.5% more accurate than the rigid representations-based IVF [48] at the same compute budget; and matches accuracy while being up to 90 faster in wall-clock time. |
| Researcher Affiliation | Collaboration | University of Washington, Google Research, Harvard University {kusupati,ali}@cs.washington.edu, prajain@google.com |
| Pseudocode | Yes | Algorithm 1 Ad ANNS-IVF Psuedocode |
| Open Source Code | Yes | Code is open-sourced at https://github.com/RAIVNLab/Ad ANNS. |
| Open Datasets | Yes | We experiment with two public datasets: (a) Image Net-1K [45] dataset on the task of image retrieval where the goal is to retrieve images from a database (1.3M image train set) belonging to the same class as the query image (50K image validation set) and (b) Natural Questions (NQ) [32] dataset on the task of question answering through dense passage retrieval where the goal is to retrieve the relevant passage from a database (21M Wikipedia passages) for a query (3.6K questions). |
| Dataset Splits | Yes | Image Net-1K [45] dataset... (50K image validation set) and Natural Questions (NQ) [32] dataset... The training set contains 79,168 question and answer pairs, the dev set has 8,757 pairs and the test set has 3,610 pairs. |
| Hardware Specification | Yes | All ANNS experiments (...) were run on an Intel Xeon 2.20GHz CPU with 12 cores. Exact Search (...) and Disk ANN experiments were run with CUDA 11.0 on a A100-SXM4 NVIDIA GPU with 40G RAM. |
| Software Dependencies | Yes | Exact Search (Flat L2, PQ, OPQ) and Disk ANN experiments were run with CUDA 11.0 on a A100-SXM4 NVIDIA GPU with 40G RAM. |
| Experiment Setup | Yes | The default setting in this work, unless otherwise stated, is np = 1, k = 1024, ND = 1281167 (Image Net-1K trainset). Ad ANNS-IVF is evaluated for all possible tuples of dc, ds, k = |C| {8, 16, . . . , 2048}. We experimented with 8 128 byte OPQ budgets. |