Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

RouterRetriever: Routing over a Mixture of Expert Embedding Models

Authors: Hyunji Lee, Luca Soldaini, Arman Cohan, Minjoon Seo, Kyle Lo

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Evaluation on the BEIR benchmark demonstrates that ROUTERRETRIEVER outperforms both models trained on MSMARCO (+2.1 absolute n DCG@10) and multi-task models (+3.2). This is achieved by employing our routing mechanism, which surpasses other routing techniques (+1.8 on average). Furthermore, the benefit generalizes well to other datasets, even in the absence of a specific expert on the dataset. ROUTERRETRIEVER is the first work to demonstrate the advantages of routing over a mixture of domain-specific expert embedding models as an alternative to a single, general-purpose embedding model, especially when retrieving from diverse, specialized domains.
Researcher Affiliation Collaboration Hyunji Lee1*, Luca Soldaini2, Arman Cohan2, 3, Minjoon Seo1, Kyle Lo2 1KAIST AI 2Allen Institute for AI 3Yale University EMAIL, EMAIL
Pseudocode Yes Algorithm 1: Constructing Pilot Embedding Library
Open Source Code Yes Code https://github.com/amy-hyunji/RouterRetriever
Open Datasets Yes We use the provided training3 and test sets in the BEIR benchmark (Thakur et al. 2021).
Dataset Splits Yes We use the provided training3 and test sets in the BEIR benchmark (Thakur et al. 2021).
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models or memory used for experiments.
Software Dependencies No The paper mentions using Contriever and LoRA as methods but does not specify software versions for libraries or programming languages used for implementation.
Experiment Setup Yes For training, we adopt the few-shot hyperparameters from Izacard et al. (2021): a learning rate of 1e-4, a batch size of 256 with in-batch negatives, and a maximum of 500 epochs with early stopping.