Deep Online Learning Via Meta-Learning: Continual Adaptation for Model-Based RL
Authors: Anusha Nagabandi, Chelsea Finn, Sergey Levine
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | we apply our meta-learning for online learning (MOLe) approach to model-based reinforcement learning, where adapting the predictive model is critical for control; we demonstrate that MOLe outperforms alternative prior methods, and enables effective continuous adaptation in non-stationary task distributions such as varying terrains, motor failures, and unexpected disturbances. |
| Researcher Affiliation | Academia | Anusha Nagabandi, Chelsea Finn & Sergey Levine University of California, Berkeley {nagaban2,cbfinn,svlevine}@berkeley.edu |
| Pseudocode | Yes | Algorithm 1 Online Learning with Mixture of Meta Trained Networks |
| Open Source Code | No | The paper provides a link for videos, not for open-source code: https://sites.google.com/berkeley.edu/onlineviameta |
| Open Datasets | No | The paper mentions using agents in the Mu Jo Co physics engine (Todorov et al., 2012) and training models on simulated data with varying conditions (e.g., 'random slopes of low magnitudes', 'random joints being crippled'), but it does not specify a pre-existing publicly available dataset with concrete access information. |
| Dataset Splits | No | The paper mentions that MAML uses 'training and validation subsets (Dtr T and Dval T)' internally during meta-training, where 'Dtr T is of size k'. However, it does not provide specific percentages, sample counts, or citations for the overall training/validation/test splits of the experimental data. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running experiments were mentioned. |
| Software Dependencies | No | The paper mentions using the 'Mu Jo Co physics engine' but does not specify its version or any other software dependencies with version numbers. |
| Experiment Setup | Yes | In all experiments, we use a dynamics model consisting of three hidden layers, each of dimension 500, with Re LU nonlinearities. ... Table 1: Hyperparameters for train-time ... Table 2: Hyperparameters for run-time |