Structured Policy Iteration for Linear Quadratic Regulator

Authors: Youngsuk Park, Ryan Rossi, Zheng Wen, Gang Wu, Handong Zhao

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, the experiments demonstrate the advantages of S-PI in terms of balancing the LQR performance and level of structure by varying the weight parameter. ... In experiments, we consider a LQR system for the purpose of validating the theoretical results and basic properties of the S-PI algorithm.
Researcher Affiliation Collaboration 1Stanford University 2Adobe Research 3Google Deep Mind.
Pseudocode Yes Algorithm 1 Stuctured Policy Iteration (S-PI) ... Algorithm 2 Subroutine: Prox Grad( f(K), η, r, λ) ... Algorithm 3 Model-free Stuctured Policy Iteration (Model-free S-PI)
Open Source Code No The paper does not provide any statement or link indicating that the source code for the methodology is openly available.
Open Datasets Yes In these experiments, we use the unstable Laplacian system (Recht, 2019). ... Large Laplacian dynamics. A Rn n where 1.1, i = j 0.1, i = j + 1 or j = i + 1 0, otherwise B = Q = In Rn n and R = 1000 In Rn n.
Dataset Splits No The paper uses a synthetic system defined by equations and parameters for its experiments, but it does not specify explicit training, validation, or test dataset splits in terms of percentages or sample counts.
Hardware Specification Yes we used a Mac Book Air (with a 1.3 GHz Intel Core i5 CPU) for experiments.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies, libraries, or frameworks used in the experiments.
Experiment Setup Yes For the Laplacian system, we regard (n, m) = (3, 3), (n, m) = (20, 20), and (n, m) = (103, 103) dimension as small, medium, and large size of system. In addition, we experiment with Lasso regularizer over various λ = 10−2 106. ... we set the initial stepsize η = 1/λ. For the backtracking linesearch, we set β = 1/2 and the convergence tolerance ϵtol = 10−6.