Learning Control Policies for Stochastic Systems with Reach-Avoid Guarantees

Authors: Đorđe Žikelić, Mathias Lechner, Thomas A. Henzinger, Krishnendu Chatterjee

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our approach on 3 stochastic non-linear reinforcement learning tasks. and the entire Experiments section with tables.
Researcher Affiliation Academia 1Institute of Science and Technology Austria (ISTA) 2Massachusetts Institute of Technology (MIT) djordje.zikelic@ist.ac.at, mlechner@mit.edu, tah@ist.ac.at, krishnendu.chatterjee@ist.ac.at
Pseudocode Yes Algorithm 1: Algorithm for learning reach-avoid policies
Open Source Code Yes 1Code is available at https://github.com/mlech26l/neural_martingales
Open Datasets Yes Our first two environments are a linear 2D system with nonlinear control bounds and the stochastic inverted pendulum control problem. The inverted pendulum environment is taken from the Open AI Gym (Brockman et al. 2016)... Our third environment concerns a collision avoidance task.
Dataset Splits No The paper conducts experiments in reinforcement learning environments, where data is generated dynamically, and does not specify traditional train/validation/test dataset splits with percentages or sample counts.
Hardware Specification No The paper does not provide specific details regarding the hardware used for running the experiments.
Software Dependencies No The paper mentions software components like PPO and Open AI Gym but does not provide specific version numbers for any libraries, frameworks, or solvers used in the implementation.
Experiment Setup Yes The policy and RASM networks consist of two hidden layers (128 units each, Re LU). The RASM network has a single output unit with a softplus activation. We run our algorithm with a timeout of 3 hours. For all tasks, we pre-train the policy networks using 100 iterations of PPO. and mentions parameters: Parameters mesh τ > 0, number of samples N N, regularization constant λ > 0