Meta-Learning Representations for Continual Learning
Authors: Khurram Javed, Martha White
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we investigate the question: can we learn a representation for continual learning that promotes future learning and reduces interference? We investigate this question by meta-learning the representations offline on a meta-training dataset. At meta-test time, we initialize the continual learner with this representation and measure prediction error as the agent learns the PLN online on a new set of CLP problems (See Figure 1). We evaluate on a simulated regression problem and a sequential classification problem using real data. |
| Researcher Affiliation | Academia | Khurram Javed, Martha White Department of Computing Science University of Alberta T6G 1P8 kjaved@ualberta.ca, whitem@ualberta.ca |
| Pseudocode | Yes | Algorithm 1: Meta-Training : MAML-Rep Algorithm 2: Meta-Training : OML |
| Open Source Code | Yes | Code accompanying paper available at https://github.com/khurramjaved96/mrcl |
| Open Datasets | Yes | Omniglot is a dataset of over 1623 characters from 50 different alphabets (Lake et al., 2015). |
| Dataset Splits | Yes | For each of the methods, we separately tune the learning rate on a five validation trajectories and report results for the best performing parameter. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions using Adam (Kingma and Ba, 2014) for optimizing the OML objective, but does not provide specific version numbers for software dependencies or libraries. |
| Experiment Setup | Yes | We use SGD on the MSE loss with a mini-batch size of 8 for online updates, and Adam (Kingma and Ba, 2014) for optimizing the OML objective. At evaluation time, we use the same learning rate as used during the inner updates in the meta-training phase for OML. For our baselines, we do a grid search over learning rates and report the results for the best performing parameter. We use six layers for the RLN and two layers for the PLN. Each hidden layer has a width of 300. |