AMAGO-2: Breaking the Multi-Task Barrier in Meta-Reinforcement Learning with Transformers
Authors: Jake Grigsby, Justin Sasek, Samyak Parajuli, Ikechukwu D. Adebi, Amy Zhang, Yuke Zhu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Large-scale comparisons in Meta-World ML45, Multi-Game Procgen, Multi-Task POPGym, Multi-Game Atari, and Baby AI find that this design unlocks significant progress in online multi-task adaptation and memory problems without explicit task labels. |
| Researcher Affiliation | Academia | Jake Grigsby Justin Sasek Samyak Parajuli Daniel Adebi Amy Zhang Yuke Zhu The University of Texas at Austin Equal contribution {grigsby,yukez}@cs.utexas.edu |
| Pseudocode | No | The paper describes the methods using equations but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Code for the agent and multi-task environments used in our experiments is available on Git Hub at UT-Austin-RPL/amago. |
| Open Datasets | Yes | Comparisons on Meta-World ML45 [17], Multi-Task POPGym [27], Multi-Game Procgen [28], Multi-Game Atari [29], and Multi-Task Baby AI [30] evaluate the importance of scale-resistant updates. |
| Dataset Splits | No | The paper mentions generating 'train/test' splits for datasets like Baby AI and Meta-World, but it does not provide specific percentages or sample counts for training, validation, and test splits needed to reproduce the data partitioning rigorously. |
| Hardware Specification | Yes | All of the results in this paper were completed on NVIDIA A5000 GPUs. We train each agent on one GPU whenever possible but add a second GPU for Procgen Memory-Hard (Figure 8) where model size and context length use all available memory. |
| Software Dependencies | No | The paper mentions various software components and techniques used, such as Adam W optimizer [104], Normformer [105], σReparam [106], IMPALA CNN [107], Dr QV2 [109], and Layer Norm [110]. However, it does not provide specific version numbers for any of these software dependencies. |
| Experiment Setup | Yes | Table 1: Learning Hyperparameter Details" and "Table 2: Agent Architecture Details" in Appendix A provide specific values for hyperparameters and architectural configurations. |