Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Your Mixture-of-Experts LLM Is Secretly an Embedding Model for Free
Authors: Ziyue Li, Tianyi Zhou
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments are conducted on 6 embedding tasks with 20 datasets from the Massive Text Embedding Benchmark (MTEB). The results demonstrate the significant improvement brought by MOEE to LLM-based embedding without further finetuning. |
| Researcher Affiliation | Academia | Ziyue Li, Tianyi Zhou Department of Computer Science University of Maryland, College Park EMAIL |
| Pseudocode | No | The paper does not contain any explicitly labeled pseudocode or algorithm blocks. It primarily describes methods using natural language and mathematical equations. |
| Open Source Code | Yes | Project: https://github.com/tianyi-lab/MoE-Embedding |
| Open Datasets | Yes | Our experiments are conducted on 6 embedding tasks with 20 datasets from the Massive Text Embedding Benchmark (MTEB). The results demonstrate the significant improvement brought by MOEE to LLM-based embedding without further finetuning. |
| Dataset Splits | Yes | We conduct extension evaluations of MOEE and compare it with baselines on the Massive Text Embedding Benchmark (MTEB) (Muennighoff et al., 2022), which covers a wide range of tasks designed to test embedding quality. ... For consistent and fair comparisons, we adopt the MTEB evaluation framework and use task-specific metrics... |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper does not list specific versions for any key software components or libraries used in the implementation or experimentation. |
| Experiment Setup | Yes | The final similarity score is then computed as: simfinal = simHS + α simRW, where α is used as a hyperparameter to control the contribution of RW. To maximize the complementary strengths of HS and RW, we optimize α adaptively at test time. ... All models use per-token routing, but MOEE uses the last token’s routing weights, which consistently outperform averaging across all tokens. For the hidden state (HS) embeddings, we use the last-layer hidden state of the last token. |