Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Enforcing robust control guarantees within neural network policies
Authors: Priya L. Donti, Melrose Roderick, Mahyar Fazlyab, J Zico Kolter
ICLR 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 EXPERIMENTS Having instantiated our general framework, we demonstrate the power of our approach on a variety of simulated control domains. In particular, we evaluate performance on the following metrics: Average-case performance: How well does the method optimize the performance objective (i.e., LQR cost) under average (non-worst case) dynamics? Worst-case stability: Does the method remain stable even when subjected to adversarial (worst-case) dynamics? In all cases, we show that our method is able to improve performance over traditional robust controllers under average conditions, while still guaranteeing stability under worst-case conditions. |
| Researcher Affiliation | Collaboration | Priya L. Donti1, Melrose Roderick1, Mahyar Fazlyab2, J. Zico Kolter1,3 1Carnegie Mellon University, 2Johns Hopkins University, 3Bosch Center for AI |
| Pseudocode | Yes | Algorithm 1 Learning provably robust controllers with deep RL |
| Open Source Code | Yes | Code for all experiments is available at https://github.com/locuslab/robust-nn-control |
| Open Datasets | No | For our experiments, we build upon the microgrid setting given in Lam et al. (2016). In this system, the state x R3 captures voltage deviations, frequency deviations, and the amount of power generated by a diesel generator connected to the grid; the action u R2 describes the current associated with a storage device and a solar PV inverter; and the disturbance w R describes the difference between the amount of power demanded and the amount of power produced by solar panels on the grid. We generate NLDIs of the form (3) with s = 5, a = 3, and d = k = 2 by generating matrices A, B, G, C and D i.i.d. from normal distributions, and producing the disturbance w(t) using a randomly-initialized neural network (with its output scaled to satisfy the norm-bound on the disturbance). The paper describes generating simulated environments/systems, not using pre-existing public datasets with concrete access information. |
| Dataset Splits | Yes | Robust PPO is trained for 50,000 updates, where each update samples 8 roll-outs; we choose the model that performs best on a hold-out set of initial conditions during training. |
| Hardware Specification | Yes | All experiments were run on an XPS 13 laptop with an Intel i7 processor. |
| Software Dependencies | No | The paper mentions specific algorithms and techniques but does not list software dependencies with version numbers (e.g., Python, PyTorch, or specific solver versions). |
| Experiment Setup | Yes | Robust MBP is optimized using gradient descent for 1,000 updates, where each update samples 20 roll-outs. Robust PPO is trained for 50,000 updates, where each update samples 8 roll-outs... The learning rate we chose for our model-based planner... we tried learning rates of 1 10 3, 1 10 4, 1 10 5 and found 1 10 3 worked best for the non-robust version and 1 10 4 worked best for the robust version. For our PPO hyperparameters, we simply used those used in the original PPO paper. |