Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Your Mixture-of-Experts LLM Is Secretly an Embedding Model for Free

Authors: Ziyue Li, Tianyi Zhou

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments are conducted on 6 embedding tasks with 20 datasets from the Massive Text Embedding Benchmark (MTEB). The results demonstrate the significant improvement brought by MOEE to LLM-based embedding without further finetuning.
Researcher Affiliation	Academia	Ziyue Li, Tianyi Zhou Department of Computer Science University of Maryland, College Park EMAIL
Pseudocode	No	The paper does not contain any explicitly labeled pseudocode or algorithm blocks. It primarily describes methods using natural language and mathematical equations.
Open Source Code	Yes	Project: https://github.com/tianyi-lab/MoE-Embedding
Open Datasets	Yes	Our experiments are conducted on 6 embedding tasks with 20 datasets from the Massive Text Embedding Benchmark (MTEB). The results demonstrate the significant improvement brought by MOEE to LLM-based embedding without further finetuning.
Dataset Splits	Yes	We conduct extension evaluations of MOEE and compare it with baselines on the Massive Text Embedding Benchmark (MTEB) (Muennighoff et al., 2022), which covers a wide range of tasks designed to test embedding quality. ... For consistent and fair comparisons, we adopt the MTEB evaluation framework and use task-specific metrics...
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper does not list specific versions for any key software components or libraries used in the implementation or experimentation.
Experiment Setup	Yes	The final similarity score is then computed as: simfinal = simHS + α simRW, where α is used as a hyperparameter to control the contribution of RW. To maximize the complementary strengths of HS and RW, we optimize α adaptively at test time. ... All models use per-token routing, but MOEE uses the last token’s routing weights, which consistently outperform averaging across all tokens. For the hidden state (HS) embeddings, we use the last-layer hidden state of the last token.