AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents
Authors: Jake Grigsby, Linxi Fan, Yuke Zhu
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our agent is scalable and applicable to a wide range of problems, and we demonstrate its strong performance empirically in meta-RL and long-term memory domains. AMAGO s focus on sparse rewards and off-policy data also allows in-context learning to extend to goal-conditioned problems with challenging exploration. Our experiments are divided into two parts. |
| Researcher Affiliation | Collaboration | Jake Grigsby1, Linxi Jim Fan2, Yuke Zhu1 1The University of Texas at Austin 2NVIDIA Research |
| Pseudocode | Yes | Algorithm 1 Simplified Hindsight Instruction Relabeling |
| Open Source Code | Yes | Our agent is open-source1 and specifically designed to be efficient, stable, and applicable to new environments with little tuning. 1Code is available here: https://ut-austin-rpl.github.io/amago/ |
| Open Datasets | Yes | We empirically demonstrate its power and flexibility in existing meta-RL and memory benchmarks, including state-of-the-art results in the POPGym suite [30]... We evaluate AMAGO on two new benchmarks... before applying it to instruction-following tasks in the procedurally generated worlds of Crafter [33]. |
| Dataset Splits | No | The paper mentions tuning AMAGO on one environment for POPGym before applying it to others, and evaluating on 'held-out test tasks' for Meta-World, but it does not specify explicit train/validation splits (e.g., 80/10/10 percentages) for any of the datasets used to reproduce the experiments. |
| Hardware Specification | Yes | Each AMAGO agent is trained on one A5000 GPU. We compare training throughput in a common locomotion benchmark [97] with more details in Appendix D. We compare training throughput... on a single NVIDIA A5000 GPU. |
| Software Dependencies | No | The paper mentions its open-source nature and provides a link to its code (which would implicitly use Python and PyTorch), but it does not explicitly list software names with version numbers in the text for reproducibility. |
| Experiment Setup | Yes | AMAGO Hyperparameter Information. Network architecture details for our main experimental domains are provided in Table 3. Table 4 lists the hyperparameters for our RL training process. Many of AMAGO s details are designed to reduce hyperparameter sensitivity, and this allows us to use a consistent configuration across most experiments. |