Responsive Safety in Reinforcement Learning by PID Lagrangian Methods
Authors: Adam Stooke, Joshua Achiam, Pieter Abbeel
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We investigated the performance of our algorithms on Problem (6) in a deep RL setting. In particular, we show the effectiveness of PID control at reducing constraint violations from oscillations and overshoot present in the baseline Lagrangian method. Both maximum performance and robustness to hyperparameter selection are considered. |
| Researcher Affiliation | Collaboration | 1University of California, Berkeley 2Open AI. |
| Pseudocode | Yes | Algorithm 1 Constraint-Controlled Reinforcement Learning |
| Open Source Code | Yes | Additional training details can be found in supplementary materials, and our implementation is available at https://github.com/astooke/safe-rlpyt. |
| Open Datasets | Yes | We use the recent Safety-Gym suite (Ray et al., 2019), which consists of robot locomotion tasks built on the Mu Jo Co simulator (Todorov et al., 2012). |
| Dataset Splits | No | The paper describes training agents in simulation environments but does not specify fixed train/validation/test dataset splits. |
| Hardware Specification | No | The paper does not explicitly provide specific hardware details such as GPU or CPU models used for running the experiments. |
| Software Dependencies | No | The paper mentions using “Proximal Policy Optimization (PPO)” and “Mu Jo Co simulator” but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | Our policy is a 2-layer MLP followed by an LSTM with a skip connection. We applied smoothing to proportional and derivative controls to accommodate noisy estimates. The environment’s finite horizons allowed use of nondiscounted episodic costs as the constraint and input to the controller. Algorithm 2 states “Choose tuning parameters: KP , KI, KD 0”. Figure 4 shows results with “KP = 0.25 and KP = 1”. Figure 9 shows results with “PI-control with KI = 1e-3, KP = 0.1”. |