Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Cautious Adaptation For Reinforcement Learning in Safety-Critical Settings
Authors: Jesse Zhang, Brian Cheung, Chelsea Finn, Sergey Levine, Dinesh Jayaraman
ICML 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments on car driving, cartpole balancing, half-cheetah locomotion, and robotic object manipulation, CARL successfully acquires cautious exploration behaviors, yielding higher rewards with fewer failures than strong RL adaptation baselines. |
| Researcher Affiliation | Academia | Jesse Zhang 1 Brian Cheung 1 Chelsea Finn 2 Sergey Levine 1 Dinesh Jayaraman 3 1UC Berkeley, CA, USA 2Stanford, CA, USA 3University of Pennsylvania, PA, USA. |
| Pseudocode | Yes | Algorithm 1 Pretraining |
| Open Source Code | No | The paper provides a project website at https://sites.google.com/berkeley.edu/carl, which is a project overview page, not a direct link to a source-code repository, and the paper does not explicitly state that the code is open-source or provided in supplementary materials. |
| Open Datasets | Yes | We modify the standard Open AI Gym cartpole task... To test SCA in the Gym half-cheetah setting... Our driving environment is based on Duckietown (Chevalier-Boisvert et al., 2018)... The robotic manipulation environment, originally presented in PDDM (Nagabandi et al., 2019)... |
| Dataset Splits | No | The paper describes training in 'source sandbox environments' and adapting to 'safety-critical target environment', which refers to different environments rather than explicit train/validation/test dataset splits with specified percentages or counts for model training. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, memory, or specific computer specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions software frameworks and algorithms like Open AI Gym, Duckietown, PDDM, PETS, MAML, PPO, and RARL, but does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | First, we describe our four safety-critical adaptation settings in detail... In our experiments, we heuristically set γ = 50. |