Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

MoEMeta: Mixture-of-Experts Meta Learning for Few-Shot Relational Learning

Authors: Han Wu, Jie Yin

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments and analyses on three KG benchmarks show that Mo EMeta consistently outperforms existing baselines, achieving state-of-the-art performance.
Researcher Affiliation Academia 1The University of Sydney, Australia 2Peking University, China {han.wu, EMAIL}
Pseudocode Yes Algorithm 1: Meta-training Algorithm of Mo EMeta
Open Source Code Yes 1The code is available at: https://github.com/alexhw15/Mo EMeta.
Open Datasets Yes We evaluate our method on three widely used KG benchmarks specifically for few-shot relational learning: Nell-One and Wiki-One (Xiong et al., 2018), as well as FB15K-One (Ran et al., 2024). The data used in the experiments (Nell-One, Wiki-One, and FB15K-One) are all publicly available.
Dataset Splits Yes The training/validation/test splits include 51/5/11, 133/16/34, and 75/11/33 tasks on Nell-One, Wiki-One, and FB15K-One, respectively.
Hardware Specification Yes All models are implemented in Py Torch and trained on a single Tesla P100 GPU. For a fair comparison, we evaluate Mo EMeta on an RTX3090 GPU to keep consistency with baseline runtimes reported in Rel Adapter (Ran et al., 2024).
Software Dependencies No All models are implemented in Py Torch and trained on a single Tesla P100 GPU. Mo EMeta is trained using the Adam optimizer (batch size: 1, 024, learning rate: 0.001).
Experiment Setup Yes Following prior work, we set the embedding dimension to 100 for Nell-One and FB15K-One, and 50 for Wiki-One, initialized using Trans E-pretrained weights provided by GMatching (Xiong et al., 2018). The number of neighbors per entity is capped at 50 for aggregation. For Mo E, each expert network is a two-layer MLP (hidden dimension: 64, Re LU). The gating network is also a two-layer MLP (hidden dimension: 64, output dimension: 1, and Re LU). We set the number of experts M as 32 and the number of selected experts N as 5. The margin γ in Eq. 16 is set to 1. Mo EMeta is trained using the Adam optimizer (batch size: 1, 024, learning rate: 0.001).