Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encoding
Authors: Rajesh Jayaram, Laxman Dhulipala, Majid Hadian, Jason D. Lee, Vahab Mirrokni
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we find that FDEs achieve the same recall as prior state-of-the-art heuristics while retrieving 2-5 fewer candidates. Compared to prior state of the art implementations, MUVERA achieves consistently good end-to-end recall and latency across a diverse set of the BEIR retrieval datasets, achieving an average of 10% improved recall with 90% lower latency. |
| Researcher Affiliation | Collaboration | Laxman Dhulipala Google Research and UMD Majid Hadian Google Deep Mind Rajesh Jayaram Google Research Jason Lee Google Research Vahab Mirrokni Google Research |
| Pseudocode | Yes | Figure 2: FDE Generation Process. Three Sim Hashes (ksim = 3) split space into six regions labelled A-F (in high-dimensions B = 2ksim, but B = 6 here since d = 2). Fq(Q), Fdoc(P) are shown as B d matrices, where the k-th row is q(k), p(k). The actual FDEs are flattened versions of these matrices. Not shown: inner projections, repetitions, and fill_empty_clusters. |
| Open Source Code | No | Our end-to-end retrieval engine is implemented in C++ in a proprietary codebase, preventing us from directly releasing it. As described in Section 3.2, we plan to publish a standalone open-source implementation of the FDE generation step upon publication, along with the product quantization code (which is a textbook method) and the ball-carving code. |
| Open Datasets | Yes | Datasets. Our evaluation includes results from six of the well-studied BEIR [46] information retrieval datasets: MS MARCO [40] (CC BY-SA 4.0), Hotpot QA (CC BY-SA 4.0) [53], NQ (Apache-2.0) [31], Quora (Apache-2.0) [46], Sci Docs (CC BY 4.0) [11], and Argu Ana (Apache-2.0) [47]. |
| Dataset Splits | Yes | Following [43], we use the development set for our experiments on MS MARCO, and use the test set on the other datasets. |
| Hardware Specification | Yes | Experimental Setup. We run our online experiments on an Intel Sapphire Rapids machine on Google Cloud (c3-standard-176). The machine supports up to 176 hyper-threads. |
| Software Dependencies | No | Insufficient information. The paper mentions 'implemented in C++' and uses 'Disk ANN [25]', but does not provide specific version numbers for any software components, libraries, or solvers. |
| Experiment Setup | Yes | We perform a grid search over FDE parameters Rreps {1, 5, 10, 15, 20}, ksim {2, 3, 4, 5, 6}, dproj {8, 16, 32, 64}... Our single-vector retrieval engine uses a scalable implementation [38] of Disk ANN [25]... We build Disk ANN indices by using the uncompressed document FDEs with a maximum degree of 200 and a build beam-width of 600... Based on these empirical results, we choose the value of τ = 0.7 in our end-to-end experiments. |