When to Update Your Model: Constrained Model-based Reinforcement Learning

Authors: Tianying Ji, Yu Luo, Fuchun Sun, Mingxuan Jing, Fengxiang He, Wenbing Huang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate CMLO on several continuous control benchmark tasks. The results show that CMLO learns much faster than other state-of-the-art methods and yields promising asymptotic performance compared with the model-free counterparts. Experiments show that CMLO surpasses other state-of-the-art methods and produces a boost when various policy optimization methods are employed.
Researcher Affiliation Collaboration Tianying Ji1, Yu Luo1, Fuchun Sun ,1, Mingxuan Jing2, Fengxiang He3, Wenbing Huang4,5 1 Department of Computer Science and Technology, Tsinghua University 2 Science & Technology on Integrated Information System Laboratory, Institute of Software Chinese Academy of Sciences 3 JD Explore Academy, JD.com Inc 4 Gaoling School of Artificial Intelligence, Renmin University of China 5 Beijing Key Laboratory of Big Data Management and Analysis Methods, Beijing, China
Pseudocode Yes Algorithm 1: CMLO
Open Source Code Yes Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] In the supplemental material.
Open Datasets Yes We evaluate CMLO and these baselines on six continuous control tasks in Open AI Gym [6] with the Mu Jo Co [54] physics simulator, including Half Cheetah, Hopper, Walker2d, Swimmer, Ant, Humanoid.
Dataset Splits No The paper does not explicitly provide specific percentages or counts for training, validation, and test dataset splits in the main text. It mentions using 'standard full-length version of these tasks' but does not detail the data partitioning for each split. The checklist indicates details are in Appendix E, but the main text lacks this information.
Hardware Specification Yes Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] See Appendix E.
Software Dependencies No The paper mentions various software components such as SAC [16], TRPO [44], PPO [46], and refers to 'pytorch-soft-actor-critic' [42]. However, it does not provide specific version numbers for these software dependencies required for reproducibility.
Experiment Setup Yes Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See Appendix E and our provided code.