When to Update Your Model: Constrained Model-based Reinforcement Learning
Authors: Tianying Ji, Yu Luo, Fuchun Sun, Mingxuan Jing, Fengxiang He, Wenbing Huang
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate CMLO on several continuous control benchmark tasks. The results show that CMLO learns much faster than other state-of-the-art methods and yields promising asymptotic performance compared with the model-free counterparts. Experiments show that CMLO surpasses other state-of-the-art methods and produces a boost when various policy optimization methods are employed. |
| Researcher Affiliation | Collaboration | Tianying Ji1, Yu Luo1, Fuchun Sun ,1, Mingxuan Jing2, Fengxiang He3, Wenbing Huang4,5 1 Department of Computer Science and Technology, Tsinghua University 2 Science & Technology on Integrated Information System Laboratory, Institute of Software Chinese Academy of Sciences 3 JD Explore Academy, JD.com Inc 4 Gaoling School of Artiļ¬cial Intelligence, Renmin University of China 5 Beijing Key Laboratory of Big Data Management and Analysis Methods, Beijing, China |
| Pseudocode | Yes | Algorithm 1: CMLO |
| Open Source Code | Yes | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] In the supplemental material. |
| Open Datasets | Yes | We evaluate CMLO and these baselines on six continuous control tasks in Open AI Gym [6] with the Mu Jo Co [54] physics simulator, including Half Cheetah, Hopper, Walker2d, Swimmer, Ant, Humanoid. |
| Dataset Splits | No | The paper does not explicitly provide specific percentages or counts for training, validation, and test dataset splits in the main text. It mentions using 'standard full-length version of these tasks' but does not detail the data partitioning for each split. The checklist indicates details are in Appendix E, but the main text lacks this information. |
| Hardware Specification | Yes | Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] See Appendix E. |
| Software Dependencies | No | The paper mentions various software components such as SAC [16], TRPO [44], PPO [46], and refers to 'pytorch-soft-actor-critic' [42]. However, it does not provide specific version numbers for these software dependencies required for reproducibility. |
| Experiment Setup | Yes | Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See Appendix E and our provided code. |