reproducibilityindex.ai

Reachability Constrained Reinforcement Learning

Authors: Dongjie Yu, Haitong Ma, Shengbo Li, Jianyu Chen

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results on different benchmarks validate the learned feasible set, the policy performance, and constraint satisfaction of RCRL, compared to CRL and safe control baselines.
Researcher Affiliation	Collaboration	1School of Vehicle and Mobility, Tsinghua University, Beijing, China 2John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts, USA. 3Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China 4Shanghai Qizhi Institute, Shanghai, China.
Pseudocode	Yes	Algorithm 1 provides the pseudo-code of an actor-critic version of RCRL. A policy-gradient version of RCRL is designed similarly in Algorithm 2.
Open Source Code	No	The paper does not contain an explicit statement or link indicating the public release of the source code for the methodology described.
Open Datasets	Yes	Benchmarks. We implement both onand off-policy RCRL and compare them with different CRL baselines. Experiments include that: (1) use double-integrator (Fisac et al., 2019) which has an analytical solution to check the correctness of feasible set learned by RCRL; (2) validate the scalability of RCRL to nonlinear control problems, specifically, a 2D quadrotor trajectory tracking task in safe-control-gym (Yuan et al., 2021), and (3) classical safe learning benchmark Safety-Gym (Achiam & Amodei, 2019).
Dataset Splits	No	The paper describes training and evaluation procedures, including averaging results over runs and specific initialization for evaluation, but it does not specify explicit training/validation/test dataset splits with percentages or sample counts, as is typical for static datasets.
Hardware Specification	No	The paper does not provide specific hardware details (such as GPU or CPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies	No	The paper mentions software components like 'Adam' optimizer, 'SAC', and 'PPO' and 'Multi-layer Perceptron' but does not provide specific version numbers for these software dependencies or the programming language used.
Experiment Setup	Yes	Table 1 and Table 2 provide detailed hyperparameters for both off-policy and on-policy algorithms, including optimizer settings (Adam β1, β2), network architecture (number of hidden layers, neurons), learning rates, discount factors, batch sizes, and more. Appendix D.1 also specifies initialization ranges for variables in the quadrotor experiment.