MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encoding

Authors: Rajesh Jayaram, Laxman Dhulipala, Majid Hadian, Jason D. Lee, Vahab Mirrokni

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we find that FDEs achieve the same recall as prior state-of-the-art heuristics while retrieving 2-5 fewer candidates. Compared to prior state of the art implementations, MUVERA achieves consistently good end-to-end recall and latency across a diverse set of the BEIR retrieval datasets, achieving an average of 10% improved recall with 90% lower latency.
Researcher Affiliation Collaboration Laxman Dhulipala Google Research and UMD Majid Hadian Google Deep Mind Rajesh Jayaram Google Research Jason Lee Google Research Vahab Mirrokni Google Research
Pseudocode Yes Figure 2: FDE Generation Process. Three Sim Hashes (ksim = 3) split space into six regions labelled A-F (in high-dimensions B = 2ksim, but B = 6 here since d = 2). Fq(Q), Fdoc(P) are shown as B d matrices, where the k-th row is q(k), p(k). The actual FDEs are flattened versions of these matrices. Not shown: inner projections, repetitions, and fill_empty_clusters.
Open Source Code No Our end-to-end retrieval engine is implemented in C++ in a proprietary codebase, preventing us from directly releasing it. As described in Section 3.2, we plan to publish a standalone open-source implementation of the FDE generation step upon publication, along with the product quantization code (which is a textbook method) and the ball-carving code.
Open Datasets Yes Datasets. Our evaluation includes results from six of the well-studied BEIR [46] information retrieval datasets: MS MARCO [40] (CC BY-SA 4.0), Hotpot QA (CC BY-SA 4.0) [53], NQ (Apache-2.0) [31], Quora (Apache-2.0) [46], Sci Docs (CC BY 4.0) [11], and Argu Ana (Apache-2.0) [47].
Dataset Splits Yes Following [43], we use the development set for our experiments on MS MARCO, and use the test set on the other datasets.
Hardware Specification Yes Experimental Setup. We run our online experiments on an Intel Sapphire Rapids machine on Google Cloud (c3-standard-176). The machine supports up to 176 hyper-threads.
Software Dependencies No Insufficient information. The paper mentions 'implemented in C++' and uses 'Disk ANN [25]', but does not provide specific version numbers for any software components, libraries, or solvers.
Experiment Setup Yes We perform a grid search over FDE parameters Rreps {1, 5, 10, 15, 20}, ksim {2, 3, 4, 5, 6}, dproj {8, 16, 32, 64}... Our single-vector retrieval engine uses a scalable implementation [38] of Disk ANN [25]... We build Disk ANN indices by using the uncompressed document FDEs with a maximum degree of 200 and a build beam-width of 600... Based on these empirical results, we choose the value of τ = 0.7 in our end-to-end experiments.