Responsive Safety in Reinforcement Learning by PID Lagrangian Methods

Authors: Adam Stooke, Joshua Achiam, Pieter Abbeel

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We investigated the performance of our algorithms on Problem (6) in a deep RL setting. In particular, we show the effectiveness of PID control at reducing constraint violations from oscillations and overshoot present in the baseline Lagrangian method. Both maximum performance and robustness to hyperparameter selection are considered.
Researcher Affiliation Collaboration 1University of California, Berkeley 2Open AI.
Pseudocode Yes Algorithm 1 Constraint-Controlled Reinforcement Learning
Open Source Code Yes Additional training details can be found in supplementary materials, and our implementation is available at https://github.com/astooke/safe-rlpyt.
Open Datasets Yes We use the recent Safety-Gym suite (Ray et al., 2019), which consists of robot locomotion tasks built on the Mu Jo Co simulator (Todorov et al., 2012).
Dataset Splits No The paper describes training agents in simulation environments but does not specify fixed train/validation/test dataset splits.
Hardware Specification No The paper does not explicitly provide specific hardware details such as GPU or CPU models used for running the experiments.
Software Dependencies No The paper mentions using “Proximal Policy Optimization (PPO)” and “Mu Jo Co simulator” but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes Our policy is a 2-layer MLP followed by an LSTM with a skip connection. We applied smoothing to proportional and derivative controls to accommodate noisy estimates. The environment’s finite horizons allowed use of nondiscounted episodic costs as the constraint and input to the controller. Algorithm 2 states “Choose tuning parameters: KP , KI, KD 0”. Figure 4 shows results with “KP = 0.25 and KP = 1”. Figure 9 shows results with “PI-control with KI = 1e-3, KP = 0.1”.