Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
An Experimental Design Perspective on Model-Based Reinforcement Learning
Authors: Viraj Mehta, Biswajit Paria, Jeff Schneider, Stefano Ermon, Willie Neiswanger
ICLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experiment with a variety of simulated continuous control problems and show that our approach learns an optimal policy with up to 5 1, 000 less data than modelbased RL baselines and 103 105 less data than model-free RL baselines. We experiment with a variety of simulated continuous control problems and show that our approach learns an optimal policy with up to 5 1, 000 less data than modelbased RL baselines and 103 105 less data than model-free RL baselines. |
| Researcher Affiliation | Academia | Viraj Mehta, Biswajit Paria, & Jeff Schneider Robotics Insitute & Machine Learning Department Carnegie Mellon University Pittsburgh, PA, USA EMAIL Stefano Ermon & Willie Neiswanger Computer Science Department Stanford University Stanford, CA, USA EMAIL |
| Pseudocode | Yes | Algorithm 1 Bayesian active reinforcement learning (BARL) using EIGτ |
| Open Source Code | No | The paper does not provide any explicit statement or link for open-source code availability for the described methodology. |
| Open Datasets | Yes | Control Problems. We tackle five control problems: the standard underactuated pendulum swing-up problem (Pendulum-v0 from Brockman et al. (2016)), a cartpole swing-up problem, a 2D lava path navigation problem, a 2-DOF robot arm reacher problem with 8-dimensional state (Reacher-v2 from Brockman et al. (2016)), and a simplified beta tracking problem from plasma control (Char et al., 2019; Mehta et al., 2020) where the controller must maintain a fixed normalized plasma pressure using as GT dynamics a model learned similarly to Abbate et al. (2021). |
| Dataset Splits | No | The paper describes evaluation episodes and data collection during training but does not specify explicit train/validation/test dataset splits with percentages or counts, as the problems are continuous control environments rather than static datasets. |
| Hardware Specification | No | The paper mentions "when run on the author’s 24-core CPU machines" but does not provide specific CPU models, GPU details, or other hardware specifications. |
| Software Dependencies | No | The paper mentions using "PILCO (Deisenroth & Rasmussen, 2011)" and "PETS (Chua et al., 2018) as implemented by Pineda et al. (2021)" and "Gaussian process (GP) prior" but does not specify version numbers for these or other software dependencies. |
| Experiment Setup | Yes | Table 2: BARL hyperparameters used for each control problem. Table 4: Hyperparameters used for optimization in MPC procedure for control problems. |