Enforcing robust control guarantees within neural network policies
Authors: Priya L. Donti, Melrose Roderick, Mahyar Fazlyab, J Zico Kolter
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 EXPERIMENTS Having instantiated our general framework, we demonstrate the power of our approach on a variety of simulated control domains. In particular, we evaluate performance on the following metrics: Average-case performance: How well does the method optimize the performance objective (i.e., LQR cost) under average (non-worst case) dynamics? Worst-case stability: Does the method remain stable even when subjected to adversarial (worst-case) dynamics? In all cases, we show that our method is able to improve performance over traditional robust controllers under average conditions, while still guaranteeing stability under worst-case conditions. |
| Researcher Affiliation | Collaboration | Priya L. Donti1, Melrose Roderick1, Mahyar Fazlyab2, J. Zico Kolter1,3 1Carnegie Mellon University, 2Johns Hopkins University, 3Bosch Center for AI |
| Pseudocode | Yes | Algorithm 1 Learning provably robust controllers with deep RL |
| Open Source Code | Yes | Code for all experiments is available at https://github.com/locuslab/robust-nn-control |
| Open Datasets | No | For our experiments, we build upon the microgrid setting given in Lam et al. (2016). In this system, the state x R3 captures voltage deviations, frequency deviations, and the amount of power generated by a diesel generator connected to the grid; the action u R2 describes the current associated with a storage device and a solar PV inverter; and the disturbance w R describes the difference between the amount of power demanded and the amount of power produced by solar panels on the grid. We generate NLDIs of the form (3) with s = 5, a = 3, and d = k = 2 by generating matrices A, B, G, C and D i.i.d. from normal distributions, and producing the disturbance w(t) using a randomly-initialized neural network (with its output scaled to satisfy the norm-bound on the disturbance). The paper describes generating simulated environments/systems, not using pre-existing public datasets with concrete access information. |
| Dataset Splits | Yes | Robust PPO is trained for 50,000 updates, where each update samples 8 roll-outs; we choose the model that performs best on a hold-out set of initial conditions during training. |
| Hardware Specification | Yes | All experiments were run on an XPS 13 laptop with an Intel i7 processor. |
| Software Dependencies | No | The paper mentions specific algorithms and techniques but does not list software dependencies with version numbers (e.g., Python, PyTorch, or specific solver versions). |
| Experiment Setup | Yes | Robust MBP is optimized using gradient descent for 1,000 updates, where each update samples 20 roll-outs. Robust PPO is trained for 50,000 updates, where each update samples 8 roll-outs... The learning rate we chose for our model-based planner... we tried learning rates of 1 10 3, 1 10 4, 1 10 5 and found 1 10 3 worked best for the non-robust version and 1 10 4 worked best for the robust version. For our PPO hyperparameters, we simply used those used in the original PPO paper. |