reproducibilityindex.ai

Bidirectional Model-based Policy Optimization

Authors: Hang Lai, Jian Shen, Weinan Zhang, Yong Yu

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that BMPO outperforms state-of-the-art model-based methods in terms of sample efﬁciency and asymptotic performance. We evaluate our BMPO and previous state-of-the-art algorithms (Haarnoja et al., 2018; Janner et al., 2019) on a range of continuous control benchmark tasks. Experiments demonstrate that BMPO achieves higher sample efﬁciency and better asymptotic performance compared with prior model-based methods which only use forward models.
Researcher Affiliation	Academia	1Shanghai Jiao Tong University, Shanghai, China. Correspondence to: Hang Lai <laihang@apex.sjtu.edu.cn>, Weinan Zhang <wnzhang@sjtu.edu.cn>.
Pseudocode	Yes	Algorithm 1: Bidirectional Model-based Policy Optimization (BMPO)
Open Source Code	No	The paper does not contain any explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	We evaluate all the algorithms in six environments in total using Open AI Gym (Brockman et al., 2016). Among them, Pendulum is one traditional control task, and Hopper, Walker2D, Ant are three complex Mu Jo Co tasks (Todorov et al., 2012). We additionally add two variants of Mu Jo Co tasks without early termination states, denoted as Hopper NT and Walker2d-NT, which have been released as benchmarking environments for MBRL (Langlois et al., 2019).
Dataset Splits	No	The paper mentions 'validation loss' and 'validation error' in the context of model training but does not provide specific details on how the dataset was split into training, validation, and test sets (e.g., percentages or sample counts), which is common in supervised learning. In RL, data is often collected dynamically.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., Python, TensorFlow, PyTorch versions) that would be needed to reproduce the experiments.
Experiment Setup	No	The paper states, 'All hyperparameters settings are provided in Appendix D.' This indicates that the specific experimental setup details, such as hyperparameter values, are not present in the main text of the paper.