The Power of Learned Locally Linear Models for Nonlinear Policy Optimization
Authors: Daniel Pfrommer, Max Simchowitz, Tyler Westenbroek, Nikolai Matni, Stephen Tu
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results validate the performance of our algorithm, and compare to natural deep-learning baselines. |
| Researcher Affiliation | Collaboration | 1Massachusetts Institute of Technology 2University of Texas, Austin 3University of Pennsylvania 4Google Brain. |
| Pseudocode | Yes | Algorithm 1 Trajectory Optimization, Algorithm 2 ESTMARKOV(π; N, σw), Algorithm 3 ESTGAINS(π; N, σw, k0) |
| Open Source Code | No | No explicit statement about releasing code for the methodology described in this paper was found. The paper mentions using third-party libraries like trajax, haiku, and optax. |
| Open Datasets | No | No concrete access information for a publicly available or open dataset was provided. The paper states: "We validate our algorithms on standard models of the quadrotor and inverted pendulum", implying simulation environments rather than pre-existing datasets with access details. |
| Dataset Splits | No | No specific dataset split information (percentages, sample counts, or citations to predefined splits) was provided. The paper describes experiments in simulated environments rather than using pre-split datasets. |
| Hardware Specification | No | No specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) were mentioned. The paper only states implementation using the "jax ecosystem". |
| Software Dependencies | No | No specific version numbers for software dependencies were provided. The paper mentions libraries like 'jax (Bradbury et al., 2018)', 'trajax (Frostig et al., 2021)', 'haiku+optax (Hennigan et al., 2020; Babuschkin et al., 2020)' without explicit version numbers. |
| Experiment Setup | Yes | More details regarding the environments, tasks, and experimental setup details are found in Appendix J. ... For pendulum, we set the width to 96, the learning rate to 10 3, and the activation to swish. For quadrotor, we set the width to 128, the learning rate to 5 10 3, and the activation to gelu. We use the Adam optimizer with 10 4 additive weight decay and a cosine decay learning schedule. |