Can Learned Optimization Make Reinforcement Learning Less Difficult?
Authors: Alexander D. Goldie, Chris Lu, Matthew T Jackson, Shimon Whiteson, Jakob Foerster
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate that when meta-trained on single and small sets of environments, OPEN outperforms or equals traditionally used optimizers. Furthermore, OPEN shows strong generalization characteristics across a range of environments and agent architectures. |
| Researcher Affiliation | Academia | 1FLAIR, University of Oxford 2Whi RL, University of Oxford |
| Pseudocode | Yes | A single inner loop step is described algorithmically in Appendix B.1. ... Algorithm 1: Example update step from OPEN. |
| Open Source Code | Yes | 1Open-source code is available here. ... We release all code used for OPEN in a publicly available repository. |
| Open Datasets | Yes | We test this in five environments: Breakout, Asterix, Space Invaders and Freeway from Min Atar [62, 63]; and Ant from Brax [64, 65]. ... We train OPEN on a distribution of gridworlds from Jackson et al. [66] ... and a set of mazes from minigrid [67] which are not in the training distribution, with unseen agent parameters. Furthermore, we test how OPEN performs 0-shot in Craftax-Classic [21]. |
| Dataset Splits | No | While the paper mentions evaluating on a "small in-distribution validation set" to select the best meta-trained optimizer (Appendix C.5), it does not provide specific quantitative details (e.g., percentages, counts) of dataset splits for training, validation, and testing in the traditional supervised learning sense. For RL, data is dynamically generated from environments. |
| Hardware Specification | Yes | We include runtimes (inference) for our experiments with the different optimizers on 4 L40s GPUs. ... We used a range of hardware for training: Nvidia A40s, Nvidia L40ses, Nvidia Ge Force GTX 1080Tis, Nvidia Ge Force RTX 2080Tis and Nvidia Ge Force RTX 3080s. |
| Software Dependencies | No | The paper mentions several software libraries and frameworks (e.g., Jax [33], evosax [34], optax [69], Pure Jax RL [15]) and lists them in Appendix K. However, it does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | We provide the cost of experiments in Appendix J, including a comparison of runtimes with other optimizers. We detail hyperparameters in Appendix C.5. ... Table 3: PPO hyperparameters. ... Table 4: Optimization hyperparameters for Min Atar environments. |