Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents
Authors: Jake Grigsby, Linxi Fan, Yuke Zhu
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our agent is scalable and applicable to a wide range of problems, and we demonstrate its strong performance empirically in meta-RL and long-term memory domains. AMAGO s focus on sparse rewards and off-policy data also allows in-context learning to extend to goal-conditioned problems with challenging exploration. Our experiments are divided into two parts. |
| Researcher Affiliation | Collaboration | Jake Grigsby1, Linxi Jim Fan2, Yuke Zhu1 1The University of Texas at Austin 2NVIDIA Research |
| Pseudocode | Yes | Algorithm 1 Simplified Hindsight Instruction Relabeling |
| Open Source Code | Yes | Our agent is open-source1 and specifically designed to be efficient, stable, and applicable to new environments with little tuning. 1Code is available here: https://ut-austin-rpl.github.io/amago/ |
| Open Datasets | Yes | We empirically demonstrate its power and flexibility in existing meta-RL and memory benchmarks, including state-of-the-art results in the POPGym suite [30]... We evaluate AMAGO on two new benchmarks... before applying it to instruction-following tasks in the procedurally generated worlds of Crafter [33]. |
| Dataset Splits | No | The paper mentions tuning AMAGO on one environment for POPGym before applying it to others, and evaluating on 'held-out test tasks' for Meta-World, but it does not specify explicit train/validation splits (e.g., 80/10/10 percentages) for any of the datasets used to reproduce the experiments. |
| Hardware Specification | Yes | Each AMAGO agent is trained on one A5000 GPU. We compare training throughput in a common locomotion benchmark [97] with more details in Appendix D. We compare training throughput... on a single NVIDIA A5000 GPU. |
| Software Dependencies | No | The paper mentions its open-source nature and provides a link to its code (which would implicitly use Python and PyTorch), but it does not explicitly list software names with version numbers in the text for reproducibility. |
| Experiment Setup | Yes | AMAGO Hyperparameter Information. Network architecture details for our main experimental domains are provided in Table 3. Table 4 lists the hyperparameters for our RL training process. Many of AMAGO s details are designed to reduce hyperparameter sensitivity, and this allows us to use a consistent configuration across most experiments. |