Deep Online Learning Via Meta-Learning: Continual Adaptation for Model-Based RL

Authors: Anusha Nagabandi, Chelsea Finn, Sergey Levine

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental we apply our meta-learning for online learning (MOLe) approach to model-based reinforcement learning, where adapting the predictive model is critical for control; we demonstrate that MOLe outperforms alternative prior methods, and enables effective continuous adaptation in non-stationary task distributions such as varying terrains, motor failures, and unexpected disturbances.
Researcher Affiliation Academia Anusha Nagabandi, Chelsea Finn & Sergey Levine University of California, Berkeley {nagaban2,cbfinn,svlevine}@berkeley.edu
Pseudocode Yes Algorithm 1 Online Learning with Mixture of Meta Trained Networks
Open Source Code No The paper provides a link for videos, not for open-source code: https://sites.google.com/berkeley.edu/onlineviameta
Open Datasets No The paper mentions using agents in the Mu Jo Co physics engine (Todorov et al., 2012) and training models on simulated data with varying conditions (e.g., 'random slopes of low magnitudes', 'random joints being crippled'), but it does not specify a pre-existing publicly available dataset with concrete access information.
Dataset Splits No The paper mentions that MAML uses 'training and validation subsets (Dtr T and Dval T)' internally during meta-training, where 'Dtr T is of size k'. However, it does not provide specific percentages, sample counts, or citations for the overall training/validation/test splits of the experimental data.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running experiments were mentioned.
Software Dependencies No The paper mentions using the 'Mu Jo Co physics engine' but does not specify its version or any other software dependencies with version numbers.
Experiment Setup Yes In all experiments, we use a dynamics model consisting of three hidden layers, each of dimension 500, with Re LU nonlinearities. ... Table 1: Hyperparameters for train-time ... Table 2: Hyperparameters for run-time