reproducibilityindex.ai

Responsive Safety in Reinforcement Learning by PID Lagrangian Methods

Authors: Adam Stooke, Joshua Achiam, Pieter Abbeel

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We investigated the performance of our algorithms on Problem (6) in a deep RL setting. In particular, we show the effectiveness of PID control at reducing constraint violations from oscillations and overshoot present in the baseline Lagrangian method. Both maximum performance and robustness to hyperparameter selection are considered.
Researcher Affiliation	Collaboration	1University of California, Berkeley 2Open AI.
Pseudocode	Yes	Algorithm 1 Constraint-Controlled Reinforcement Learning
Open Source Code	Yes	Additional training details can be found in supplementary materials, and our implementation is available at https://github.com/astooke/safe-rlpyt.
Open Datasets	Yes	We use the recent Safety-Gym suite (Ray et al., 2019), which consists of robot locomotion tasks built on the Mu Jo Co simulator (Todorov et al., 2012).
Dataset Splits	No	The paper describes training agents in simulation environments but does not specify fixed train/validation/test dataset splits.
Hardware Specification	No	The paper does not explicitly provide specific hardware details such as GPU or CPU models used for running the experiments.
Software Dependencies	No	The paper mentions using “Proximal Policy Optimization (PPO)” and “Mu Jo Co simulator” but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	Our policy is a 2-layer MLP followed by an LSTM with a skip connection. We applied smoothing to proportional and derivative controls to accommodate noisy estimates. The environment’s ﬁnite horizons allowed use of nondiscounted episodic costs as the constraint and input to the controller. Algorithm 2 states “Choose tuning parameters: KP , KI, KD 0”. Figure 4 shows results with “KP = 0.25 and KP = 1”. Figure 9 shows results with “PI-control with KI = 1e-3, KP = 0.1”.