Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
AMAGO-2: Breaking the Multi-Task Barrier in Meta-Reinforcement Learning with Transformers
Authors: Jake Grigsby, Justin Sasek, Samyak Parajuli, Ikechukwu D. Adebi, Amy Zhang, Yuke Zhu
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Large-scale comparisons in Meta-World ML45, Multi-Game Procgen, Multi-Task POPGym, Multi-Game Atari, and Baby AI find that this design unlocks significant progress in online multi-task adaptation and memory problems without explicit task labels. |
| Researcher Affiliation | Academia | Jake Grigsby Justin Sasek Samyak Parajuli Daniel Adebi Amy Zhang Yuke Zhu The University of Texas at Austin Equal contribution EMAIL |
| Pseudocode | No | The paper describes the methods using equations but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Code for the agent and multi-task environments used in our experiments is available on Git Hub at UT-Austin-RPL/amago. |
| Open Datasets | Yes | Comparisons on Meta-World ML45 [17], Multi-Task POPGym [27], Multi-Game Procgen [28], Multi-Game Atari [29], and Multi-Task Baby AI [30] evaluate the importance of scale-resistant updates. |
| Dataset Splits | No | The paper mentions generating 'train/test' splits for datasets like Baby AI and Meta-World, but it does not provide specific percentages or sample counts for training, validation, and test splits needed to reproduce the data partitioning rigorously. |
| Hardware Specification | Yes | All of the results in this paper were completed on NVIDIA A5000 GPUs. We train each agent on one GPU whenever possible but add a second GPU for Procgen Memory-Hard (Figure 8) where model size and context length use all available memory. |
| Software Dependencies | No | The paper mentions various software components and techniques used, such as Adam W optimizer [104], Normformer [105], ฯReparam [106], IMPALA CNN [107], Dr QV2 [109], and Layer Norm [110]. However, it does not provide specific version numbers for any of these software dependencies. |
| Experiment Setup | Yes | Table 1: Learning Hyperparameter Details" and "Table 2: Agent Architecture Details" in Appendix A provide specific values for hyperparameters and architectural configurations. |