Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Off-policy Reinforcement Learning with Model-based Exploration Augmentation

Authors: Likun Wang, Xiangteng Zhang, Yinuo Wang, Guojian Zhan, Wenxuan Wang, Haoyu Gao, Jingliang Duan, Shengbo Eben Li

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results on Open AI Gym and Deep Mind Control Suite reveal that Mo GE effectively bridges exploration and policy learning, leading to remarkable gains in both sample efficiency and performance across complex control tasks. Experiments on standard continuous control benchmarks, including Open AI Gym [5] and Deep Mind Control Suite [61] demonstrate that Mo GE, as a plug-in module, consistently improves both the final performance and the sample efficiency of baseline off-policy RL algorithms.
Researcher Affiliation	Academia	1School of Vehicle and Mobility & College of AI, Tsinghua University 2School of Mechanical Engineering, University of Science and Technology Beijing
Pseudocode	Yes	Algorithm 1 Off-policy RL training framework with Mo GE
Open Source Code	No	We will make the full code publicly available when the paper is accepted, but the core code is public with open access in Appendix B.
Open Datasets	Yes	Empirical results on Open AI Gym and Deep Mind Control Suite reveal that Mo GE effectively bridges exploration and policy learning, leading to remarkable gains in both sample efficiency and performance across complex control tasks. Experiments on standard continuous control benchmarks, including Open AI Gym [5] and Deep Mind Control Suite [61] demonstrate that Mo GE, as a plug-in module, consistently improves both the final performance and the sample efficiency of baseline off-policy RL algorithms.
Dataset Splits	No	The paper mentions evaluating on benchmarks like Open AI Gym and Deep Mind Control Suite, and states "the total training step size for all experiments is set at 1.5 million, with the results of all experiments averaged over 3 random seeds." However, it does not explicitly provide specific train/test/validation split percentages or counts for these datasets, nor does it refer to predefined splits with citations for reproducibility of data partitioning, other than using the standard benchmark tasks.
Hardware Specification	Yes	The experiments are performed on an AMD Ryzen Threadripper 3960X 24-Core Processor and an NVIDIA Ge Force RTX 4090 GPU.
Software Dependencies	No	The paper mentions using "Optimizer Adam" in the hyperparameter table but does not specify version numbers for Python, PyTorch, CUDA, or other key software libraries or dependencies. Therefore, a fully reproducible software environment cannot be established from the provided information.
Experiment Setup	Yes	All hyperparameters are aligned with standard implementations, and the configuration details are documented in the Appendix B. In Mo GE, we adopt the hyperparameter settings without additional fine-tuning and use the same configuration across all previously demonstrated tasks, which are listed in Table 3.