Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Off-policy Reinforcement Learning with Model-based Exploration Augmentation
Authors: Likun Wang, Xiangteng Zhang, Yinuo Wang, Guojian Zhan, Wenxuan Wang, Haoyu Gao, Jingliang Duan, Shengbo Eben Li
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results on Open AI Gym and Deep Mind Control Suite reveal that Mo GE effectively bridges exploration and policy learning, leading to remarkable gains in both sample efficiency and performance across complex control tasks. Experiments on standard continuous control benchmarks, including Open AI Gym [5] and Deep Mind Control Suite [61] demonstrate that Mo GE, as a plug-in module, consistently improves both the final performance and the sample efficiency of baseline off-policy RL algorithms. |
| Researcher Affiliation | Academia | 1School of Vehicle and Mobility & College of AI, Tsinghua University 2School of Mechanical Engineering, University of Science and Technology Beijing |
| Pseudocode | Yes | Algorithm 1 Off-policy RL training framework with Mo GE |
| Open Source Code | No | We will make the full code publicly available when the paper is accepted, but the core code is public with open access in Appendix B. |
| Open Datasets | Yes | Empirical results on Open AI Gym and Deep Mind Control Suite reveal that Mo GE effectively bridges exploration and policy learning, leading to remarkable gains in both sample efficiency and performance across complex control tasks. Experiments on standard continuous control benchmarks, including Open AI Gym [5] and Deep Mind Control Suite [61] demonstrate that Mo GE, as a plug-in module, consistently improves both the final performance and the sample efficiency of baseline off-policy RL algorithms. |
| Dataset Splits | No | The paper mentions evaluating on benchmarks like Open AI Gym and Deep Mind Control Suite, and states "the total training step size for all experiments is set at 1.5 million, with the results of all experiments averaged over 3 random seeds." However, it does not explicitly provide specific train/test/validation split percentages or counts for these datasets, nor does it refer to predefined splits with citations for reproducibility of data partitioning, other than using the standard benchmark tasks. |
| Hardware Specification | Yes | The experiments are performed on an AMD Ryzen Threadripper 3960X 24-Core Processor and an NVIDIA Ge Force RTX 4090 GPU. |
| Software Dependencies | No | The paper mentions using "Optimizer Adam" in the hyperparameter table but does not specify version numbers for Python, PyTorch, CUDA, or other key software libraries or dependencies. Therefore, a fully reproducible software environment cannot be established from the provided information. |
| Experiment Setup | Yes | All hyperparameters are aligned with standard implementations, and the configuration details are documented in the Appendix B. In Mo GE, we adopt the hyperparameter settings without additional fine-tuning and use the same configuration across all previously demonstrated tasks, which are listed in Table 3. |