Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning on One Mode: Addressing Multi-modality in Offline Reinforcement Learning
Authors: Mianchu Wang, Yue Jin, Giovanni Montana
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, LOM outperforms existing methods on standard D4RL benchmarks and demonstrates its effectiveness in complex, multi-modal scenarios. (...) 6 EXPERIMENTAL RESULTS |
| Researcher Affiliation | Academia | Mianchu Wang Yue Jin Giovanni Montana University of Warwick The Alan Turing Institute EMAIL |
| Pseudocode | Yes | Algorithm 1 Weighted imitation learning on one mode (LOM). |
| Open Source Code | Yes | The code has been open sourced 1. 1Git Hub repository: https://github.com/Mianchu Wang/LOM |
| Open Datasets | Yes | We evaluate LOM on three Mu Jo Co locomotion tasks from the D4RL benchmark (Fu et al., 2020): halfcheetah, hopper, and walker2d. |
| Dataset Splits | Yes | Each environment contains five dataset types: (i) medium 1M samples from a policy trained to approximately one-third of expert performance; (ii) medium-replay the replay buffer of a policy trained to match the performance of the medium agent (0.2M for halfcheetah, 0.4M for hopper, 0.3M for walker2d); (iii) medium-expert a 50-50 split of medium and expert data (just under 2M samples); (iv) expert 1M samples from a fully trained SAC policy (Haarnoja et al., 2018); and (v) full-replay 1M samples from the final replay buffer of an expert policy. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) are mentioned in the paper, only the type of tasks (MuJoCo locomotion) and general terms like "robot arm". |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9) are provided in the paper. |
| Experiment Setup | Yes | Table 3: Hyperparameters used in the experiments. (includes M, β, C, ρ, update_delay, network architectures with dimensions for πρ, πθ, Qψ, Qϕ) |