Stability Verification in Stochastic Control Systems via Neural Network Supermartingales

Authors: Mathias Lechner, Đorđe Žikelić, Krishnendu Chatterjee, Thomas A. Henzinger7326-7336

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we validate our approach experimentally on a set of nonlinear stochastic reinforcement learning environments with neural network policies.
Researcher Affiliation Academia IST Austria, Klosterneuburg, Austria {mathias.lechner, djordje.zikelic, krishnendu.chatterjee, tah}@ist.ac.at
Pseudocode Yes Algorithm 1: Verification of a.s. asymptotic stability
Open Source Code No The paper does not contain an explicit statement or link indicating that the source code for their methodology is publicly available.
Open Datasets No The paper uses two benchmark environments: a two-dimensional dynamical system and a stochastic variant of the inverted pendulum problem. However, it does not explicitly provide access information (link, DOI, formal citation) to a pre-collected, publicly available dataset used for training.
Dataset Splits No The paper describes training policies within RL environments and then verifying them. It does not explicitly provide specific training/validation/test dataset splits (e.g., percentages or sample counts) for a pre-collected dataset, as data is generated through interaction with the environment during policy training.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or cloud instance types) used for running the experiments.
Software Dependencies No The paper mentions using 'proximal policy optimization' and 'Open AI Gym' but does not provide specific version numbers for any software dependencies, libraries, or frameworks used in the experiments.
Experiment Setup Yes Our RSM neural networks consist of one hidden layer with 128 Re LU units. For each RL task, we consider the state space X = {x | ||x||1 0.5} and train a control policy comprised of two hidden layers with 128 Re LU units each by using proximal policy optimization (Schulman et al. 2017), while applying our Lipschitz regularization to keep the Lipschitz constant of the policy within a reasonable bound. We then run our algorithm to verify that the region Xs = {x | ||x||1 0.2} is a.s. asymptotically stable.Input Dynamics function f, policy π, disturbance distribution d, region Xs X, Lipschitz constants Lf, Lπ parameters τ > 0, N N, λ > 0