Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning on One Mode: Addressing Multi-modality in Offline Reinforcement Learning

Authors: Mianchu Wang, Yue Jin, Giovanni Montana

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, LOM outperforms existing methods on standard D4RL benchmarks and demonstrates its effectiveness in complex, multi-modal scenarios. (...) 6 EXPERIMENTAL RESULTS
Researcher Affiliation	Academia	Mianchu Wang Yue Jin Giovanni Montana University of Warwick The Alan Turing Institute EMAIL
Pseudocode	Yes	Algorithm 1 Weighted imitation learning on one mode (LOM).
Open Source Code	Yes	The code has been open sourced 1. 1Git Hub repository: https://github.com/Mianchu Wang/LOM
Open Datasets	Yes	We evaluate LOM on three Mu Jo Co locomotion tasks from the D4RL benchmark (Fu et al., 2020): halfcheetah, hopper, and walker2d.
Dataset Splits	Yes	Each environment contains five dataset types: (i) medium 1M samples from a policy trained to approximately one-third of expert performance; (ii) medium-replay the replay buffer of a policy trained to match the performance of the medium agent (0.2M for halfcheetah, 0.4M for hopper, 0.3M for walker2d); (iii) medium-expert a 50-50 split of medium and expert data (just under 2M samples); (iv) expert 1M samples from a fully trained SAC policy (Haarnoja et al., 2018); and (v) full-replay 1M samples from the final replay buffer of an expert policy.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory) are mentioned in the paper, only the type of tasks (MuJoCo locomotion) and general terms like "robot arm".
Software Dependencies	No	No specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9) are provided in the paper.
Experiment Setup	Yes	Table 3: Hyperparameters used in the experiments. (includes M, β, C, ρ, update_delay, network architectures with dimensions for πρ, πθ, Qψ, Qϕ)