Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
RouterRetriever: Routing over a Mixture of Expert Embedding Models
Authors: Hyunji Lee, Luca Soldaini, Arman Cohan, Minjoon Seo, Kyle Lo
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Evaluation on the BEIR benchmark demonstrates that ROUTERRETRIEVER outperforms both models trained on MSMARCO (+2.1 absolute n DCG@10) and multi-task models (+3.2). This is achieved by employing our routing mechanism, which surpasses other routing techniques (+1.8 on average). Furthermore, the benefit generalizes well to other datasets, even in the absence of a specific expert on the dataset. ROUTERRETRIEVER is the first work to demonstrate the advantages of routing over a mixture of domain-specific expert embedding models as an alternative to a single, general-purpose embedding model, especially when retrieving from diverse, specialized domains. |
| Researcher Affiliation | Collaboration | Hyunji Lee1*, Luca Soldaini2, Arman Cohan2, 3, Minjoon Seo1, Kyle Lo2 1KAIST AI 2Allen Institute for AI 3Yale University EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: Constructing Pilot Embedding Library |
| Open Source Code | Yes | Code https://github.com/amy-hyunji/RouterRetriever |
| Open Datasets | Yes | We use the provided training3 and test sets in the BEIR benchmark (Thakur et al. 2021). |
| Dataset Splits | Yes | We use the provided training3 and test sets in the BEIR benchmark (Thakur et al. 2021). |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models or memory used for experiments. |
| Software Dependencies | No | The paper mentions using Contriever and LoRA as methods but does not specify software versions for libraries or programming languages used for implementation. |
| Experiment Setup | Yes | For training, we adopt the few-shot hyperparameters from Izacard et al. (2021): a learning rate of 1e-4, a batch size of 256 with in-batch negatives, and a maximum of 500 epochs with early stopping. |