reproducibilityindex.ai

Stabilizing Dynamical Systems via Policy Gradient Methods

Authors: Juan Perdomo, Jack Umenberger, Max Simchowitz

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We prove that this method efﬁciently recovers a stabilizing controller for linear systems, and for smooth, nonlinear systems within a neighborhood of their equilibria. Our approach overcomes a signiﬁcant limitation of prior work, namely the need for a pre-given stabilizing control policy. We empirically evaluate the effectiveness of our approach on common control benchmarks.
Researcher Affiliation	Academia	Juan C. Perdomo University of California, Berkeley Jack Umenberger MIT Max Simchowitz MIT Correspondence to jcperdomo@berkeley.edu
Pseudocode	Yes	Figure 1: Discount Annealing Initialize: Objective J( \| ), γ0 (0, ρ(A) 2), K0 0, and Q I, R I For t = 0, 1, ...
Open Source Code	No	The paper does not provide any explicit statements or links regarding the availability of open-source code for the described methodology.
Open Datasets	No	The paper describes experiments on a 'simulated nonlinear system' (cart-pole) but does not provide concrete access information (e.g., link, DOI, or specific citation) for a publicly available or open dataset used for training.
Dataset Splits	No	The paper does not specify exact dataset splits (e.g., percentages or counts) for training, validation, or testing.
Hardware Specification	No	Simulations were carried out in PyTorch [Paszke et al., 2019] and run on a single GPU. This statement is too general and does not specify the exact model of the GPU or other hardware components used.
Software Dependencies	No	The paper mentions 'PyTorch [Paszke et al., 2019]' and 'Adam [Kingma and Ba, 2014]' but does not specify version numbers for these software dependencies, which is necessary for reproducibility.
Experiment Setup	Yes	Setup. The discounted annealing algorithm of Figure 1 was implemented as follows. ... We used N = 5000 and H = 1000 in our experiments. For the cost function, we used Q = Ts I and R = Ts. ... Instead of using SGD updates for policy gradients, we use Adam [Kingma and Ba, 2014] with a learning rate of η = 0.01/r. Furthermore, we replace the policy gradient termination criteria in Step 2 (Eq. (2.2)) by instead halting after a ﬁxed number (M = 200) of gradient descent steps. ... Finally, we used an initial discount factor γ0 = 0.9 Ajac 2 2 , where Ajac denotes the linearization of the (discrete-time) cart-pole about the vertical equilibrium.