MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encoding
Authors: Rajesh Jayaram, Laxman Dhulipala, Majid Hadian, Jason D. Lee, Vahab Mirrokni
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we find that FDEs achieve the same recall as prior state-of-the-art heuristics while retrieving 2-5 fewer candidates. Compared to prior state of the art implementations, MUVERA achieves consistently good end-to-end recall and latency across a diverse set of the BEIR retrieval datasets, achieving an average of 10% improved recall with 90% lower latency. |
| Researcher Affiliation | Collaboration | Laxman Dhulipala Google Research and UMD Majid Hadian Google Deep Mind Rajesh Jayaram Google Research Jason Lee Google Research Vahab Mirrokni Google Research |
| Pseudocode | Yes | Figure 2: FDE Generation Process. Three Sim Hashes (ksim = 3) split space into six regions labelled A-F (in high-dimensions B = 2ksim, but B = 6 here since d = 2). Fq(Q), Fdoc(P) are shown as B d matrices, where the k-th row is q(k), p(k). The actual FDEs are flattened versions of these matrices. Not shown: inner projections, repetitions, and fill_empty_clusters. |
| Open Source Code | No | Our end-to-end retrieval engine is implemented in C++ in a proprietary codebase, preventing us from directly releasing it. As described in Section 3.2, we plan to publish a standalone open-source implementation of the FDE generation step upon publication, along with the product quantization code (which is a textbook method) and the ball-carving code. |
| Open Datasets | Yes | Datasets. Our evaluation includes results from six of the well-studied BEIR [46] information retrieval datasets: MS MARCO [40] (CC BY-SA 4.0), Hotpot QA (CC BY-SA 4.0) [53], NQ (Apache-2.0) [31], Quora (Apache-2.0) [46], Sci Docs (CC BY 4.0) [11], and Argu Ana (Apache-2.0) [47]. |
| Dataset Splits | Yes | Following [43], we use the development set for our experiments on MS MARCO, and use the test set on the other datasets. |
| Hardware Specification | Yes | Experimental Setup. We run our online experiments on an Intel Sapphire Rapids machine on Google Cloud (c3-standard-176). The machine supports up to 176 hyper-threads. |
| Software Dependencies | No | Insufficient information. The paper mentions 'implemented in C++' and uses 'Disk ANN [25]', but does not provide specific version numbers for any software components, libraries, or solvers. |
| Experiment Setup | Yes | We perform a grid search over FDE parameters Rreps {1, 5, 10, 15, 20}, ksim {2, 3, 4, 5, 6}, dproj {8, 16, 32, 64}... Our single-vector retrieval engine uses a scalable implementation [38] of Disk ANN [25]... We build Disk ANN indices by using the uncompressed document FDEs with a maximum degree of 200 and a build beam-width of 600... Based on these empirical results, we choose the value of τ = 0.7 in our end-to-end experiments. |