An Experimental Design Perspective on Model-Based Reinforcement Learning
Authors: Viraj Mehta, Biswajit Paria, Jeff Schneider, Stefano Ermon, Willie Neiswanger
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experiment with a variety of simulated continuous control problems and show that our approach learns an optimal policy with up to 5 1, 000 less data than modelbased RL baselines and 103 105 less data than model-free RL baselines. We experiment with a variety of simulated continuous control problems and show that our approach learns an optimal policy with up to 5 1, 000 less data than modelbased RL baselines and 103 105 less data than model-free RL baselines. |
| Researcher Affiliation | Academia | Viraj Mehta, Biswajit Paria, & Jeff Schneider Robotics Insitute & Machine Learning Department Carnegie Mellon University Pittsburgh, PA, USA {virajm, bparia, schneide}@cs.cmu.edu Stefano Ermon & Willie Neiswanger Computer Science Department Stanford University Stanford, CA, USA {ermon, neiswanger}@cs.stanford.edu |
| Pseudocode | Yes | Algorithm 1 Bayesian active reinforcement learning (BARL) using EIGτ |
| Open Source Code | No | The paper does not provide any explicit statement or link for open-source code availability for the described methodology. |
| Open Datasets | Yes | Control Problems. We tackle five control problems: the standard underactuated pendulum swing-up problem (Pendulum-v0 from Brockman et al. (2016)), a cartpole swing-up problem, a 2D lava path navigation problem, a 2-DOF robot arm reacher problem with 8-dimensional state (Reacher-v2 from Brockman et al. (2016)), and a simplified beta tracking problem from plasma control (Char et al., 2019; Mehta et al., 2020) where the controller must maintain a fixed normalized plasma pressure using as GT dynamics a model learned similarly to Abbate et al. (2021). |
| Dataset Splits | No | The paper describes evaluation episodes and data collection during training but does not specify explicit train/validation/test dataset splits with percentages or counts, as the problems are continuous control environments rather than static datasets. |
| Hardware Specification | No | The paper mentions "when run on the author’s 24-core CPU machines" but does not provide specific CPU models, GPU details, or other hardware specifications. |
| Software Dependencies | No | The paper mentions using "PILCO (Deisenroth & Rasmussen, 2011)" and "PETS (Chua et al., 2018) as implemented by Pineda et al. (2021)" and "Gaussian process (GP) prior" but does not specify version numbers for these or other software dependencies. |
| Experiment Setup | Yes | Table 2: BARL hyperparameters used for each control problem. Table 4: Hyperparameters used for optimization in MPC procedure for control problems. |