reproducibilityindex.ai

Structured Policy Iteration for Linear Quadratic Regulator

Authors: Youngsuk Park, Ryan Rossi, Zheng Wen, Gang Wu, Handong Zhao

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, the experiments demonstrate the advantages of S-PI in terms of balancing the LQR performance and level of structure by varying the weight parameter. ... In experiments, we consider a LQR system for the purpose of validating the theoretical results and basic properties of the S-PI algorithm.
Researcher Affiliation	Collaboration	1Stanford University 2Adobe Research 3Google Deep Mind.
Pseudocode	Yes	Algorithm 1 Stuctured Policy Iteration (S-PI) ... Algorithm 2 Subroutine: Prox Grad( f(K), η, r, λ) ... Algorithm 3 Model-free Stuctured Policy Iteration (Model-free S-PI)
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the methodology is openly available.
Open Datasets	Yes	In these experiments, we use the unstable Laplacian system (Recht, 2019). ... Large Laplacian dynamics. A Rn n where 1.1, i = j 0.1, i = j + 1 or j = i + 1 0, otherwise B = Q = In Rn n and R = 1000 In Rn n.
Dataset Splits	No	The paper uses a synthetic system defined by equations and parameters for its experiments, but it does not specify explicit training, validation, or test dataset splits in terms of percentages or sample counts.
Hardware Specification	Yes	we used a Mac Book Air (with a 1.3 GHz Intel Core i5 CPU) for experiments.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies, libraries, or frameworks used in the experiments.
Experiment Setup	Yes	For the Laplacian system, we regard (n, m) = (3, 3), (n, m) = (20, 20), and (n, m) = (103, 103) dimension as small, medium, and large size of system. In addition, we experiment with Lasso regularizer over various λ = 10−2 106. ... we set the initial stepsize η = 1/λ. For the backtracking linesearch, we set β = 1/2 and the convergence tolerance ϵtol = 10−6.