Stability Verification in Stochastic Control Systems via Neural Network Supermartingales
Authors: Mathias Lechner, Đorđe Žikelić, Krishnendu Chatterjee, Thomas A. Henzinger7326-7336
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we validate our approach experimentally on a set of nonlinear stochastic reinforcement learning environments with neural network policies. |
| Researcher Affiliation | Academia | IST Austria, Klosterneuburg, Austria {mathias.lechner, djordje.zikelic, krishnendu.chatterjee, tah}@ist.ac.at |
| Pseudocode | Yes | Algorithm 1: Verification of a.s. asymptotic stability |
| Open Source Code | No | The paper does not contain an explicit statement or link indicating that the source code for their methodology is publicly available. |
| Open Datasets | No | The paper uses two benchmark environments: a two-dimensional dynamical system and a stochastic variant of the inverted pendulum problem. However, it does not explicitly provide access information (link, DOI, formal citation) to a pre-collected, publicly available dataset used for training. |
| Dataset Splits | No | The paper describes training policies within RL environments and then verifying them. It does not explicitly provide specific training/validation/test dataset splits (e.g., percentages or sample counts) for a pre-collected dataset, as data is generated through interaction with the environment during policy training. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or cloud instance types) used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'proximal policy optimization' and 'Open AI Gym' but does not provide specific version numbers for any software dependencies, libraries, or frameworks used in the experiments. |
| Experiment Setup | Yes | Our RSM neural networks consist of one hidden layer with 128 Re LU units. For each RL task, we consider the state space X = {x | ||x||1 0.5} and train a control policy comprised of two hidden layers with 128 Re LU units each by using proximal policy optimization (Schulman et al. 2017), while applying our Lipschitz regularization to keep the Lipschitz constant of the policy within a reasonable bound. We then run our algorithm to verify that the region Xs = {x | ||x||1 0.2} is a.s. asymptotically stable.Input Dynamics function f, policy π, disturbance distribution d, region Xs X, Lipschitz constants Lf, Lπ parameters τ > 0, N N, λ > 0 |