The Power of Learned Locally Linear Models for Nonlinear Policy Optimization

Authors: Daniel Pfrommer, Max Simchowitz, Tyler Westenbroek, Nikolai Matni, Stephen Tu

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results validate the performance of our algorithm, and compare to natural deep-learning baselines.
Researcher Affiliation Collaboration 1Massachusetts Institute of Technology 2University of Texas, Austin 3University of Pennsylvania 4Google Brain.
Pseudocode Yes Algorithm 1 Trajectory Optimization, Algorithm 2 ESTMARKOV(π; N, σw), Algorithm 3 ESTGAINS(π; N, σw, k0)
Open Source Code No No explicit statement about releasing code for the methodology described in this paper was found. The paper mentions using third-party libraries like trajax, haiku, and optax.
Open Datasets No No concrete access information for a publicly available or open dataset was provided. The paper states: "We validate our algorithms on standard models of the quadrotor and inverted pendulum", implying simulation environments rather than pre-existing datasets with access details.
Dataset Splits No No specific dataset split information (percentages, sample counts, or citations to predefined splits) was provided. The paper describes experiments in simulated environments rather than using pre-split datasets.
Hardware Specification No No specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) were mentioned. The paper only states implementation using the "jax ecosystem".
Software Dependencies No No specific version numbers for software dependencies were provided. The paper mentions libraries like 'jax (Bradbury et al., 2018)', 'trajax (Frostig et al., 2021)', 'haiku+optax (Hennigan et al., 2020; Babuschkin et al., 2020)' without explicit version numbers.
Experiment Setup Yes More details regarding the environments, tasks, and experimental setup details are found in Appendix J. ... For pendulum, we set the width to 96, the learning rate to 10 3, and the activation to swish. For quadrotor, we set the width to 128, the learning rate to 5 10 3, and the activation to gelu. We use the Adam optimizer with 10 4 additive weight decay and a cosine decay learning schedule.