Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
RouterRetriever: Routing over a Mixture of Expert Embedding Models
Authors: Hyunji Lee, Luca Soldaini, Arman Cohan, Minjoon Seo, Kyle Lo
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Evaluation on the BEIR benchmark demonstrates that ROUTERRETRIEVER outperforms both models trained on MSMARCO (+2.1 absolute n DCG@10) and multi-task models (+3.2). This is achieved by employing our routing mechanism, which surpasses other routing techniques (+1.8 on average). Furthermore, the benefit generalizes well to other datasets, even in the absence of a specific expert on the dataset. ROUTERRETRIEVER is the first work to demonstrate the advantages of routing over a mixture of domain-specific expert embedding models as an alternative to a single, general-purpose embedding model, especially when retrieving from diverse, specialized domains. |
| Researcher Affiliation | Collaboration | Hyunji Lee1*, Luca Soldaini2, Arman Cohan2, 3, Minjoon Seo1, Kyle Lo2 1KAIST AI 2Allen Institute for AI 3Yale University EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: Constructing Pilot Embedding Library |
| Open Source Code | Yes | Code https://github.com/amy-hyunji/RouterRetriever |
| Open Datasets | Yes | We use the provided training3 and test sets in the BEIR benchmark (Thakur et al. 2021). |
| Dataset Splits | Yes | We use the provided training3 and test sets in the BEIR benchmark (Thakur et al. 2021). |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models or memory used for experiments. |
| Software Dependencies | No | The paper mentions using Contriever and LoRA as methods but does not specify software versions for libraries or programming languages used for implementation. |
| Experiment Setup | Yes | For training, we adopt the few-shot hyperparameters from Izacard et al. (2021): a learning rate of 1e-4, a batch size of 256 with in-batch negatives, and a maximum of 500 epochs with early stopping. |