Can Learned Optimization Make Reinforcement Learning Less Difficult?

Authors: Alexander D. Goldie, Chris Lu, Matthew T Jackson, Shimon Whiteson, Jakob Foerster

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments demonstrate that when meta-trained on single and small sets of environments, OPEN outperforms or equals traditionally used optimizers. Furthermore, OPEN shows strong generalization characteristics across a range of environments and agent architectures.
Researcher Affiliation Academia 1FLAIR, University of Oxford 2Whi RL, University of Oxford
Pseudocode Yes A single inner loop step is described algorithmically in Appendix B.1. ... Algorithm 1: Example update step from OPEN.
Open Source Code Yes 1Open-source code is available here. ... We release all code used for OPEN in a publicly available repository.
Open Datasets Yes We test this in five environments: Breakout, Asterix, Space Invaders and Freeway from Min Atar [62, 63]; and Ant from Brax [64, 65]. ... We train OPEN on a distribution of gridworlds from Jackson et al. [66] ... and a set of mazes from minigrid [67] which are not in the training distribution, with unseen agent parameters. Furthermore, we test how OPEN performs 0-shot in Craftax-Classic [21].
Dataset Splits No While the paper mentions evaluating on a "small in-distribution validation set" to select the best meta-trained optimizer (Appendix C.5), it does not provide specific quantitative details (e.g., percentages, counts) of dataset splits for training, validation, and testing in the traditional supervised learning sense. For RL, data is dynamically generated from environments.
Hardware Specification Yes We include runtimes (inference) for our experiments with the different optimizers on 4 L40s GPUs. ... We used a range of hardware for training: Nvidia A40s, Nvidia L40ses, Nvidia Ge Force GTX 1080Tis, Nvidia Ge Force RTX 2080Tis and Nvidia Ge Force RTX 3080s.
Software Dependencies No The paper mentions several software libraries and frameworks (e.g., Jax [33], evosax [34], optax [69], Pure Jax RL [15]) and lists them in Appendix K. However, it does not provide specific version numbers for these dependencies.
Experiment Setup Yes We provide the cost of experiments in Appendix J, including a comparison of runtimes with other optimizers. We detail hyperparameters in Appendix C.5. ... Table 3: PPO hyperparameters. ... Table 4: Optimization hyperparameters for Min Atar environments.