Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Your Mixture-of-Experts LLM Is Secretly an Embedding Model for Free

Authors: Ziyue Li, Tianyi Zhou

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments are conducted on 6 embedding tasks with 20 datasets from the Massive Text Embedding Benchmark (MTEB). The results demonstrate the significant improvement brought by MOEE to LLM-based embedding without further finetuning.
Researcher Affiliation Academia Ziyue Li, Tianyi Zhou Department of Computer Science University of Maryland, College Park EMAIL
Pseudocode No The paper does not contain any explicitly labeled pseudocode or algorithm blocks. It primarily describes methods using natural language and mathematical equations.
Open Source Code Yes Project: https://github.com/tianyi-lab/MoE-Embedding
Open Datasets Yes Our experiments are conducted on 6 embedding tasks with 20 datasets from the Massive Text Embedding Benchmark (MTEB). The results demonstrate the significant improvement brought by MOEE to LLM-based embedding without further finetuning.
Dataset Splits Yes We conduct extension evaluations of MOEE and compare it with baselines on the Massive Text Embedding Benchmark (MTEB) (Muennighoff et al., 2022), which covers a wide range of tasks designed to test embedding quality. ... For consistent and fair comparisons, we adopt the MTEB evaluation framework and use task-specific metrics...
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper does not list specific versions for any key software components or libraries used in the implementation or experimentation.
Experiment Setup Yes The final similarity score is then computed as: simfinal = simHS + α simRW, where α is used as a hyperparameter to control the contribution of RW. To maximize the complementary strengths of HS and RW, we optimize α adaptively at test time. ... All models use per-token routing, but MOEE uses the last token’s routing weights, which consistently outperform averaging across all tokens. For the hidden state (HS) embeddings, we use the last-layer hidden state of the last token.