Meta-Learning with Warped Gradient Descent

Authors: Sebastian Flennerhag, Andrei A. Rusu, Razvan Pascanu, Francesco Visin, Hujun Yin, Raia Hadsell

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate Warp Grad in a set of experiments designed to answer three questions: (1) do Warp Grad methods retain the inductive bias of MAML-based fewshot learners? (2) Can Warp Grad methods scale to problems beyond the reach of such methods? (3) Can Warp Grad generalise to complex meta-learning problems? ... Warp MAML outperforms all baselines (Table 1), improving 1and 5-shot accuracy by 3.6 and 5.5 percentage points on mini Image Net (Vinyals et al., 2016; Ravi & Larochelle, 2016) and by 5.2 and 3.8 percentage points on tiered Image Net (Ren et al., 2018), which indicates that Warp Grad retains the inductive bias of MAML-based meta-learners.
Researcher Affiliation Collaboration 1The University of Manchester, 2The Alan Turing Institute, 3Deep Mind {flennerhag,andreirusu,razp,visin,raia}@google.com hujun.yin@manchester.ac.uk
Pseudocode Yes Algorithm 1 Warp Grad: online meta-training
Open Source Code Yes Open-source implementation available at https://github.com/flennerhag/warpgrad.
Open Datasets Yes mini Image Net This dataset is a subset of 100 classes sampled randomly from the 1000 base classes in the ILSVRC-12 training set... tiered Image Net As described in (Ren et al., 2018)... Omniglot (Lake et al., 2011)
Dataset Splits Yes classes are split into non-overlapping meta-training, meta-validation and meta-tests sets with 64, 16, and 20 classes in each respectively.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper does not specify the version numbers for any software dependencies or libraries used for the experiments.
Experiment Setup Yes Hyper-parameters were tuned independently for each condition using random grid search (including filter sizes full experimental settings in Appendix H), and we report best results from our experiments or the literature. ... 60000 metatraining steps were performed using meta-gradients over a single randomly selected task instances and their entire trajectories of 5 adaptation steps. Task-specific adaptation was done using stochastic gradient descent without momentum. We use Adam (Kingma & Ba, 2015) for meta-updates.