On Effective Scheduling of Model-based Reinforcement Learning

Authors: Hang Lai, Jian Shen, Weinan Zhang, Yimin Huang, Xing Zhang, Ruiming Tang, Yong Yu, Zhenguo Li

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we conduct experiments to verify the effectiveness of Auto MBPO and provide comprehensive empirical analysis.
Researcher Affiliation Collaboration 1Shanghai Jiao Tong University, 2Huawei Noah s Ark Lab
Pseudocode Yes Algorithm 1: Auto MBPO
Open Source Code Yes In the supplemental material.
Open Datasets Yes The compared methods are evaluated on three Mu Jo Co [29] (Hopper, Ant, Humanoid), and three Py Bullet [6] (Hopper Bullet, Walker2d Bullet, Halcheetah Bullet) continuous control tasks as our target-MDPs.
Dataset Splits No In practical implementation, an early stopping trick is adopted in training the model ensemble. To be more specific, when training each individual model, a hold-out dataset will be created, and the training will early stop if the loss evaluated on the hold-out data does not decrease.
Hardware Specification Yes All experiments in this paper are conducted on a single NVIDIA 2080 Ti GPU.
Software Dependencies Yes We implement Auto MBPO based on PyTorch 1.5.0
Experiment Setup Yes the three hyperparameters (real ratio, model training frequency, and policy training iteration) are scheduled on all tasks, while the rollout length is not scheduled on the Py Bullet tasks... More experimental details and hyperparameter configuration for hyper-controller can be found in Appendix F and H.