reproducibilityindex.ai

On Effective Scheduling of Model-based Reinforcement Learning

Authors: Hang Lai, Jian Shen, Weinan Zhang, Yimin Huang, Xing Zhang, Ruiming Tang, Yong Yu, Zhenguo Li

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we conduct experiments to verify the effectiveness of Auto MBPO and provide comprehensive empirical analysis.
Researcher Affiliation	Collaboration	1Shanghai Jiao Tong University, 2Huawei Noah s Ark Lab
Pseudocode	Yes	Algorithm 1: Auto MBPO
Open Source Code	Yes	In the supplemental material.
Open Datasets	Yes	The compared methods are evaluated on three Mu Jo Co [29] (Hopper, Ant, Humanoid), and three Py Bullet [6] (Hopper Bullet, Walker2d Bullet, Halcheetah Bullet) continuous control tasks as our target-MDPs.
Dataset Splits	No	In practical implementation, an early stopping trick is adopted in training the model ensemble. To be more speciﬁc, when training each individual model, a hold-out dataset will be created, and the training will early stop if the loss evaluated on the hold-out data does not decrease.
Hardware Specification	Yes	All experiments in this paper are conducted on a single NVIDIA 2080 Ti GPU.
Software Dependencies	Yes	We implement Auto MBPO based on PyTorch 1.5.0
Experiment Setup	Yes	the three hyperparameters (real ratio, model training frequency, and policy training iteration) are scheduled on all tasks, while the rollout length is not scheduled on the Py Bullet tasks... More experimental details and hyperparameter conﬁguration for hyper-controller can be found in Appendix F and H.