Stabilizing Dynamical Systems via Policy Gradient Methods

Authors: Juan Perdomo, Jack Umenberger, Max Simchowitz

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We prove that this method efficiently recovers a stabilizing controller for linear systems, and for smooth, nonlinear systems within a neighborhood of their equilibria. Our approach overcomes a significant limitation of prior work, namely the need for a pre-given stabilizing control policy. We empirically evaluate the effectiveness of our approach on common control benchmarks.
Researcher Affiliation Academia Juan C. Perdomo University of California, Berkeley Jack Umenberger MIT Max Simchowitz MIT Correspondence to jcperdomo@berkeley.edu
Pseudocode Yes Figure 1: Discount Annealing Initialize: Objective J( | ), γ0 (0, ρ(A) 2), K0 0, and Q I, R I For t = 0, 1, ...
Open Source Code No The paper does not provide any explicit statements or links regarding the availability of open-source code for the described methodology.
Open Datasets No The paper describes experiments on a 'simulated nonlinear system' (cart-pole) but does not provide concrete access information (e.g., link, DOI, or specific citation) for a publicly available or open dataset used for training.
Dataset Splits No The paper does not specify exact dataset splits (e.g., percentages or counts) for training, validation, or testing.
Hardware Specification No Simulations were carried out in PyTorch [Paszke et al., 2019] and run on a single GPU. This statement is too general and does not specify the exact model of the GPU or other hardware components used.
Software Dependencies No The paper mentions 'PyTorch [Paszke et al., 2019]' and 'Adam [Kingma and Ba, 2014]' but does not specify version numbers for these software dependencies, which is necessary for reproducibility.
Experiment Setup Yes Setup. The discounted annealing algorithm of Figure 1 was implemented as follows. ... We used N = 5000 and H = 1000 in our experiments. For the cost function, we used Q = Ts I and R = Ts. ... Instead of using SGD updates for policy gradients, we use Adam [Kingma and Ba, 2014] with a learning rate of η = 0.01/r. Furthermore, we replace the policy gradient termination criteria in Step 2 (Eq. (2.2)) by instead halting after a fixed number (M = 200) of gradient descent steps. ... Finally, we used an initial discount factor γ0 = 0.9 Ajac 2 2 , where Ajac denotes the linearization of the (discrete-time) cart-pole about the vertical equilibrium.