Enforcing robust control guarantees within neural network policies

Authors: Priya L. Donti, Melrose Roderick, Mahyar Fazlyab, J Zico Kolter

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 EXPERIMENTS Having instantiated our general framework, we demonstrate the power of our approach on a variety of simulated control domains. In particular, we evaluate performance on the following metrics: Average-case performance: How well does the method optimize the performance objective (i.e., LQR cost) under average (non-worst case) dynamics? Worst-case stability: Does the method remain stable even when subjected to adversarial (worst-case) dynamics? In all cases, we show that our method is able to improve performance over traditional robust controllers under average conditions, while still guaranteeing stability under worst-case conditions.
Researcher Affiliation Collaboration Priya L. Donti1, Melrose Roderick1, Mahyar Fazlyab2, J. Zico Kolter1,3 1Carnegie Mellon University, 2Johns Hopkins University, 3Bosch Center for AI
Pseudocode Yes Algorithm 1 Learning provably robust controllers with deep RL
Open Source Code Yes Code for all experiments is available at https://github.com/locuslab/robust-nn-control
Open Datasets No For our experiments, we build upon the microgrid setting given in Lam et al. (2016). In this system, the state x R3 captures voltage deviations, frequency deviations, and the amount of power generated by a diesel generator connected to the grid; the action u R2 describes the current associated with a storage device and a solar PV inverter; and the disturbance w R describes the difference between the amount of power demanded and the amount of power produced by solar panels on the grid. We generate NLDIs of the form (3) with s = 5, a = 3, and d = k = 2 by generating matrices A, B, G, C and D i.i.d. from normal distributions, and producing the disturbance w(t) using a randomly-initialized neural network (with its output scaled to satisfy the norm-bound on the disturbance). The paper describes generating simulated environments/systems, not using pre-existing public datasets with concrete access information.
Dataset Splits Yes Robust PPO is trained for 50,000 updates, where each update samples 8 roll-outs; we choose the model that performs best on a hold-out set of initial conditions during training.
Hardware Specification Yes All experiments were run on an XPS 13 laptop with an Intel i7 processor.
Software Dependencies No The paper mentions specific algorithms and techniques but does not list software dependencies with version numbers (e.g., Python, PyTorch, or specific solver versions).
Experiment Setup Yes Robust MBP is optimized using gradient descent for 1,000 updates, where each update samples 20 roll-outs. Robust PPO is trained for 50,000 updates, where each update samples 8 roll-outs... The learning rate we chose for our model-based planner... we tried learning rates of 1 10 3, 1 10 4, 1 10 5 and found 1 10 3 worked best for the non-robust version and 1 10 4 worked best for the robust version. For our PPO hyperparameters, we simply used those used in the original PPO paper.