Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning
Authors: Anusha Nagabandi, Ignasi Clavera, Simin Liu, Ronald S. Fearing, Pieter Abbeel, Sergey Levine, Chelsea Finn
ICLR 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate online adaptation for continuous control tasks on both simulated and real-world agents. |
| Researcher Affiliation | Academia | Anusha Nagabandi*, Ignasi Clavera*, Simin Liu, Ronald S. Fearing, Pieter Abbeel, Sergey Levine, & Chelsea Finn University of California, Berkeley EMAIL EMAIL |
| Pseudocode | Yes | Algorithm 1 Model-Based Meta-Reinforcement Learning (train time) and Algorithm 2 Online Model Adaptation (test time) are provided in Section 4. |
| Open Source Code | No | The paper mentions videos are available online at a project website, but does not state that source code for their method is released or provide a link to a code repository. |
| Open Datasets | No | We meta-train a dynamics model for this robot using the meta-objective described in Equation 3, and we train it to adapt on entirely real-world data from three different training terrains: carpet, styrofoam, and turf. We collect approximately 30 minutes of data from each of the three training terrains. The paper describes a custom dataset but does not provide concrete access information (link, DOI, citation) for public availability. |
| Dataset Splits | Yes | In these experiments, note that all agents were meta-trained on a distribution of tasks/environments (as detailed above), but we then evaluate their adaptation ability on unseen environments at test time. |
| Hardware Specification | No | All experiments are conducted in a motion capture room. Computation is done on an external computer... The paper does not provide specific hardware details such as CPU, GPU models, or memory. |
| Software Dependencies | No | The paper mentions the "Mu Jo Co physics engine (Todorov et al., 2012)" but does not specify its version number or any other software dependencies with version numbers. |
| Experiment Setup | Yes | Appendix E, titled "HYPERPARAMETERS", provides tables (Table 3, 4, 5) listing specific values for learning rates, epochs, K, M, batch sizes, and other training parameters for different tasks. |