Cautious Adaptation For Reinforcement Learning in Safety-Critical Settings
Authors: Jesse Zhang, Brian Cheung, Chelsea Finn, Sergey Levine, Dinesh Jayaraman
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments on car driving, cartpole balancing, half-cheetah locomotion, and robotic object manipulation, CARL successfully acquires cautious exploration behaviors, yielding higher rewards with fewer failures than strong RL adaptation baselines. |
| Researcher Affiliation | Academia | Jesse Zhang 1 Brian Cheung 1 Chelsea Finn 2 Sergey Levine 1 Dinesh Jayaraman 3 1UC Berkeley, CA, USA 2Stanford, CA, USA 3University of Pennsylvania, PA, USA. |
| Pseudocode | Yes | Algorithm 1 Pretraining |
| Open Source Code | No | The paper provides a project website at https://sites.google.com/berkeley.edu/carl, which is a project overview page, not a direct link to a source-code repository, and the paper does not explicitly state that the code is open-source or provided in supplementary materials. |
| Open Datasets | Yes | We modify the standard Open AI Gym cartpole task... To test SCA in the Gym half-cheetah setting... Our driving environment is based on Duckietown (Chevalier-Boisvert et al., 2018)... The robotic manipulation environment, originally presented in PDDM (Nagabandi et al., 2019)... |
| Dataset Splits | No | The paper describes training in 'source sandbox environments' and adapting to 'safety-critical target environment', which refers to different environments rather than explicit train/validation/test dataset splits with specified percentages or counts for model training. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, memory, or specific computer specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions software frameworks and algorithms like Open AI Gym, Duckietown, PDDM, PETS, MAML, PPO, and RARL, but does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | First, we describe our four safety-critical adaptation settings in detail... In our experiments, we heuristically set γ = 50. |