Bidirectional Model-based Policy Optimization
Authors: Hang Lai, Jian Shen, Weinan Zhang, Yong Yu
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that BMPO outperforms state-of-the-art model-based methods in terms of sample efficiency and asymptotic performance. We evaluate our BMPO and previous state-of-the-art algorithms (Haarnoja et al., 2018; Janner et al., 2019) on a range of continuous control benchmark tasks. Experiments demonstrate that BMPO achieves higher sample efficiency and better asymptotic performance compared with prior model-based methods which only use forward models. |
| Researcher Affiliation | Academia | 1Shanghai Jiao Tong University, Shanghai, China. Correspondence to: Hang Lai <laihang@apex.sjtu.edu.cn>, Weinan Zhang <wnzhang@sjtu.edu.cn>. |
| Pseudocode | Yes | Algorithm 1: Bidirectional Model-based Policy Optimization (BMPO) |
| Open Source Code | No | The paper does not contain any explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We evaluate all the algorithms in six environments in total using Open AI Gym (Brockman et al., 2016). Among them, Pendulum is one traditional control task, and Hopper, Walker2D, Ant are three complex Mu Jo Co tasks (Todorov et al., 2012). We additionally add two variants of Mu Jo Co tasks without early termination states, denoted as Hopper NT and Walker2d-NT, which have been released as benchmarking environments for MBRL (Langlois et al., 2019). |
| Dataset Splits | No | The paper mentions 'validation loss' and 'validation error' in the context of model training but does not provide specific details on how the dataset was split into training, validation, and test sets (e.g., percentages or sample counts), which is common in supervised learning. In RL, data is often collected dynamically. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python, TensorFlow, PyTorch versions) that would be needed to reproduce the experiments. |
| Experiment Setup | No | The paper states, 'All hyperparameters settings are provided in Appendix D.' This indicates that the specific experimental setup details, such as hyperparameter values, are not present in the main text of the paper. |