reproducibilityindex.ai

Learning Control Policies for Stochastic Systems with Reach-Avoid Guarantees

Authors: Đorđe Žikelić, Mathias Lechner, Thomas A. Henzinger, Krishnendu Chatterjee

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate our approach on 3 stochastic non-linear reinforcement learning tasks. and the entire Experiments section with tables.
Researcher Affiliation	Academia	1Institute of Science and Technology Austria (ISTA) 2Massachusetts Institute of Technology (MIT) djordje.zikelic@ist.ac.at, mlechner@mit.edu, tah@ist.ac.at, krishnendu.chatterjee@ist.ac.at
Pseudocode	Yes	Algorithm 1: Algorithm for learning reach-avoid policies
Open Source Code	Yes	1Code is available at https://github.com/mlech26l/neural_martingales
Open Datasets	Yes	Our first two environments are a linear 2D system with nonlinear control bounds and the stochastic inverted pendulum control problem. The inverted pendulum environment is taken from the Open AI Gym (Brockman et al. 2016)... Our third environment concerns a collision avoidance task.
Dataset Splits	No	The paper conducts experiments in reinforcement learning environments, where data is generated dynamically, and does not specify traditional train/validation/test dataset splits with percentages or sample counts.
Hardware Specification	No	The paper does not provide specific details regarding the hardware used for running the experiments.
Software Dependencies	No	The paper mentions software components like PPO and Open AI Gym but does not provide specific version numbers for any libraries, frameworks, or solvers used in the implementation.
Experiment Setup	Yes	The policy and RASM networks consist of two hidden layers (128 units each, Re LU). The RASM network has a single output unit with a softplus activation. We run our algorithm with a timeout of 3 hours. For all tasks, we pre-train the policy networks using 100 iterations of PPO. and mentions parameters: Parameters mesh τ > 0, number of samples N N, regularization constant λ > 0