reproducibilityindex.ai

The Power of Learned Locally Linear Models for Nonlinear Policy Optimization

Authors: Daniel Pfrommer, Max Simchowitz, Tyler Westenbroek, Nikolai Matni, Stephen Tu

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results validate the performance of our algorithm, and compare to natural deep-learning baselines.
Researcher Affiliation	Collaboration	1Massachusetts Institute of Technology 2University of Texas, Austin 3University of Pennsylvania 4Google Brain.
Pseudocode	Yes	Algorithm 1 Trajectory Optimization, Algorithm 2 ESTMARKOV(π; N, σw), Algorithm 3 ESTGAINS(π; N, σw, k0)
Open Source Code	No	No explicit statement about releasing code for the methodology described in this paper was found. The paper mentions using third-party libraries like trajax, haiku, and optax.
Open Datasets	No	No concrete access information for a publicly available or open dataset was provided. The paper states: "We validate our algorithms on standard models of the quadrotor and inverted pendulum", implying simulation environments rather than pre-existing datasets with access details.
Dataset Splits	No	No specific dataset split information (percentages, sample counts, or citations to predefined splits) was provided. The paper describes experiments in simulated environments rather than using pre-split datasets.
Hardware Specification	No	No specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) were mentioned. The paper only states implementation using the "jax ecosystem".
Software Dependencies	No	No specific version numbers for software dependencies were provided. The paper mentions libraries like 'jax (Bradbury et al., 2018)', 'trajax (Frostig et al., 2021)', 'haiku+optax (Hennigan et al., 2020; Babuschkin et al., 2020)' without explicit version numbers.
Experiment Setup	Yes	More details regarding the environments, tasks, and experimental setup details are found in Appendix J. ... For pendulum, we set the width to 96, the learning rate to 10 3, and the activation to swish. For quadrotor, we set the width to 128, the learning rate to 5 10 3, and the activation to gelu. We use the Adam optimizer with 10 4 additive weight decay and a cosine decay learning schedule.