reproducibilityindex.ai

Can Learned Optimization Make Reinforcement Learning Less Difficult?

Authors: Alexander D. Goldie, Chris Lu, Matthew T Jackson, Shimon Whiteson, Jakob Foerster

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments demonstrate that when meta-trained on single and small sets of environments, OPEN outperforms or equals traditionally used optimizers. Furthermore, OPEN shows strong generalization characteristics across a range of environments and agent architectures.
Researcher Affiliation	Academia	1FLAIR, University of Oxford 2Whi RL, University of Oxford
Pseudocode	Yes	A single inner loop step is described algorithmically in Appendix B.1. ... Algorithm 1: Example update step from OPEN.
Open Source Code	Yes	1Open-source code is available here. ... We release all code used for OPEN in a publicly available repository.
Open Datasets	Yes	We test this in five environments: Breakout, Asterix, Space Invaders and Freeway from Min Atar [62, 63]; and Ant from Brax [64, 65]. ... We train OPEN on a distribution of gridworlds from Jackson et al. [66] ... and a set of mazes from minigrid [67] which are not in the training distribution, with unseen agent parameters. Furthermore, we test how OPEN performs 0-shot in Craftax-Classic [21].
Dataset Splits	No	While the paper mentions evaluating on a "small in-distribution validation set" to select the best meta-trained optimizer (Appendix C.5), it does not provide specific quantitative details (e.g., percentages, counts) of dataset splits for training, validation, and testing in the traditional supervised learning sense. For RL, data is dynamically generated from environments.
Hardware Specification	Yes	We include runtimes (inference) for our experiments with the different optimizers on 4 L40s GPUs. ... We used a range of hardware for training: Nvidia A40s, Nvidia L40ses, Nvidia Ge Force GTX 1080Tis, Nvidia Ge Force RTX 2080Tis and Nvidia Ge Force RTX 3080s.
Software Dependencies	No	The paper mentions several software libraries and frameworks (e.g., Jax [33], evosax [34], optax [69], Pure Jax RL [15]) and lists them in Appendix K. However, it does not provide specific version numbers for these dependencies.
Experiment Setup	Yes	We provide the cost of experiments in Appendix J, including a comparison of runtimes with other optimizers. We detail hyperparameters in Appendix C.5. ... Table 3: PPO hyperparameters. ... Table 4: Optimization hyperparameters for Min Atar environments.