Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning
Authors: Anusha Nagabandi, Ignasi Clavera, Simin Liu, Ronald S. Fearing, Pieter Abbeel, Sergey Levine, Chelsea Finn
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate online adaptation for continuous control tasks on both simulated and real-world agents. |
| Researcher Affiliation | Academia | Anusha Nagabandi*, Ignasi Clavera*, Simin Liu, Ronald S. Fearing, Pieter Abbeel, Sergey Levine, & Chelsea Finn University of California, Berkeley {nagaban2,iclavera,simin.liu}@berkeley.edu {ronf,pabbeel,svlevine,cbfinn}@berkeley.edu |
| Pseudocode | Yes | Algorithm 1 Model-Based Meta-Reinforcement Learning (train time) and Algorithm 2 Online Model Adaptation (test time) are provided in Section 4. |
| Open Source Code | No | The paper mentions videos are available online at a project website, but does not state that source code for their method is released or provide a link to a code repository. |
| Open Datasets | No | We meta-train a dynamics model for this robot using the meta-objective described in Equation 3, and we train it to adapt on entirely real-world data from three different training terrains: carpet, styrofoam, and turf. We collect approximately 30 minutes of data from each of the three training terrains. The paper describes a custom dataset but does not provide concrete access information (link, DOI, citation) for public availability. |
| Dataset Splits | Yes | In these experiments, note that all agents were meta-trained on a distribution of tasks/environments (as detailed above), but we then evaluate their adaptation ability on unseen environments at test time. |
| Hardware Specification | No | All experiments are conducted in a motion capture room. Computation is done on an external computer... The paper does not provide specific hardware details such as CPU, GPU models, or memory. |
| Software Dependencies | No | The paper mentions the "Mu Jo Co physics engine (Todorov et al., 2012)" but does not specify its version number or any other software dependencies with version numbers. |
| Experiment Setup | Yes | Appendix E, titled "HYPERPARAMETERS", provides tables (Table 3, 4, 5) listing specific values for learning rates, epochs, K, M, batch sizes, and other training parameters for different tasks. |