Stabilizing Dynamical Systems via Policy Gradient Methods
Authors: Juan Perdomo, Jack Umenberger, Max Simchowitz
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We prove that this method efficiently recovers a stabilizing controller for linear systems, and for smooth, nonlinear systems within a neighborhood of their equilibria. Our approach overcomes a significant limitation of prior work, namely the need for a pre-given stabilizing control policy. We empirically evaluate the effectiveness of our approach on common control benchmarks. |
| Researcher Affiliation | Academia | Juan C. Perdomo University of California, Berkeley Jack Umenberger MIT Max Simchowitz MIT Correspondence to jcperdomo@berkeley.edu |
| Pseudocode | Yes | Figure 1: Discount Annealing Initialize: Objective J( | ), γ0 (0, ρ(A) 2), K0 0, and Q I, R I For t = 0, 1, ... |
| Open Source Code | No | The paper does not provide any explicit statements or links regarding the availability of open-source code for the described methodology. |
| Open Datasets | No | The paper describes experiments on a 'simulated nonlinear system' (cart-pole) but does not provide concrete access information (e.g., link, DOI, or specific citation) for a publicly available or open dataset used for training. |
| Dataset Splits | No | The paper does not specify exact dataset splits (e.g., percentages or counts) for training, validation, or testing. |
| Hardware Specification | No | Simulations were carried out in PyTorch [Paszke et al., 2019] and run on a single GPU. This statement is too general and does not specify the exact model of the GPU or other hardware components used. |
| Software Dependencies | No | The paper mentions 'PyTorch [Paszke et al., 2019]' and 'Adam [Kingma and Ba, 2014]' but does not specify version numbers for these software dependencies, which is necessary for reproducibility. |
| Experiment Setup | Yes | Setup. The discounted annealing algorithm of Figure 1 was implemented as follows. ... We used N = 5000 and H = 1000 in our experiments. For the cost function, we used Q = Ts I and R = Ts. ... Instead of using SGD updates for policy gradients, we use Adam [Kingma and Ba, 2014] with a learning rate of η = 0.01/r. Furthermore, we replace the policy gradient termination criteria in Step 2 (Eq. (2.2)) by instead halting after a fixed number (M = 200) of gradient descent steps. ... Finally, we used an initial discount factor γ0 = 0.9 Ajac 2 2 , where Ajac denotes the linearization of the (discrete-time) cart-pole about the vertical equilibrium. |