reproducibilityindex.ai

Verified Safe Reinforcement Learning for Neural Network Dynamic Models

Authors: Junlin Wu, Huan Zhang, Yevgeniy Vorobeychik

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments on five safe control problems demonstrate that our trained controllers can achieve verified safety over horizons that are as much as an order of magnitude longer than state-of-the-art baselines, while maintaining high reward, as well as a perfect safety record over entire episodes.
Researcher Affiliation	Academia	Junlin Wu Computer Science & Engineering Washington University in St. Louis junlin.wu@wustl.edu Huan Zhang Electrical & Computer Engineering University of Illinois Urbana-Champaign huan@huan-zhang.com Yevgeniy Vorobeychik Computer Science & Engineering Washington University in St. Louis yvorobeychik@wustl.edu
Pseudocode	Yes	Algorithm 1 Curriculum Learning with Memorization and Algorithm 2 Initial-State-Dependent Controller are presented in the paper.
Open Source Code	Yes	Our code is available at https://github.com/jlwu002/VSRL.
Open Datasets	No	The paper describes the environment dynamics and initial state regions (S0) for its simulations (e.g., "Our lane following environment follows the discrete-time bicycle model"), but it does not specify or link to a publicly available or open dataset in the conventional sense (e.g., a named dataset with a specific access URL or citation).
Dataset Splits	No	The paper describes how initial regions S0 are split into a grid G0 for verification and how empirical safety is evaluated by sampling datapoints from S0, but it does not specify explicit training/validation/test dataset splits in percentages or counts for model training.
Hardware Specification	Yes	Our code runs on an AMD Ryzen 9 5900X CPU with a 12-core processor and an NVIDIA GeForce RTX 3090 GPU.
Software Dependencies	No	The paper mentions using "α,β-CROWN toolbox" and it is "implemented with the auto_Li RPA package" but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	We evaluate our proposed approach in five control settings: Lane Following, Vehicle Avoidance, 2D Quadrotor (with both fixed and moving obstacles), and 3D Quadrotor [Kong et al., 2015, Dai et al., 2021]. The dynamics of these environments are approximated using NN with ReLU activations. We use a continuous action space for those discrete-time systems. In each experiment, we specify the initial region S0 for which we wish to achieve verified safety. We then aim to achieve the maximum K for which safety can be verified. Our controller is initialized using a controller pretrained with a safe RL algorithm. When training with the bound loss, we add a large penalty on unsafe states to incentivize maintaining safety throughout the entire trajectory. For branch and bound during verification, we set the precision limit as 0.025, which means as soon as the precision of the grid region reaches this precision, we stop branching. For the dynamics approximation, we use an NN with two layers of ReLU each of size 8. The parameters are set as dmax = 0.7, θmax = π/4, and vmax = 5.0. The initial regions S0 is x [−0.5, 0.5], θ [−0.2, 0.2], v [0.0, 0.5].