Structured Policy Iteration for Linear Quadratic Regulator
Authors: Youngsuk Park, Ryan Rossi, Zheng Wen, Gang Wu, Handong Zhao
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, the experiments demonstrate the advantages of S-PI in terms of balancing the LQR performance and level of structure by varying the weight parameter. ... In experiments, we consider a LQR system for the purpose of validating the theoretical results and basic properties of the S-PI algorithm. |
| Researcher Affiliation | Collaboration | 1Stanford University 2Adobe Research 3Google Deep Mind. |
| Pseudocode | Yes | Algorithm 1 Stuctured Policy Iteration (S-PI) ... Algorithm 2 Subroutine: Prox Grad( f(K), η, r, λ) ... Algorithm 3 Model-free Stuctured Policy Iteration (Model-free S-PI) |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the methodology is openly available. |
| Open Datasets | Yes | In these experiments, we use the unstable Laplacian system (Recht, 2019). ... Large Laplacian dynamics. A Rn n where 1.1, i = j 0.1, i = j + 1 or j = i + 1 0, otherwise B = Q = In Rn n and R = 1000 In Rn n. |
| Dataset Splits | No | The paper uses a synthetic system defined by equations and parameters for its experiments, but it does not specify explicit training, validation, or test dataset splits in terms of percentages or sample counts. |
| Hardware Specification | Yes | we used a Mac Book Air (with a 1.3 GHz Intel Core i5 CPU) for experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies, libraries, or frameworks used in the experiments. |
| Experiment Setup | Yes | For the Laplacian system, we regard (n, m) = (3, 3), (n, m) = (20, 20), and (n, m) = (103, 103) dimension as small, medium, and large size of system. In addition, we experiment with Lasso regularizer over various λ = 10−2 106. ... we set the initial stepsize η = 1/λ. For the backtracking linesearch, we set β = 1/2 and the convergence tolerance ϵtol = 10−6. |