reproducibilityindex.ai

Meta-Learning with Warped Gradient Descent

Authors: Sebastian Flennerhag, Andrei A. Rusu, Razvan Pascanu, Francesco Visin, Hujun Yin, Raia Hadsell

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Warp Grad in a set of experiments designed to answer three questions: (1) do Warp Grad methods retain the inductive bias of MAML-based fewshot learners? (2) Can Warp Grad methods scale to problems beyond the reach of such methods? (3) Can Warp Grad generalise to complex meta-learning problems? ... Warp MAML outperforms all baselines (Table 1), improving 1and 5-shot accuracy by 3.6 and 5.5 percentage points on mini Image Net (Vinyals et al., 2016; Ravi & Larochelle, 2016) and by 5.2 and 3.8 percentage points on tiered Image Net (Ren et al., 2018), which indicates that Warp Grad retains the inductive bias of MAML-based meta-learners.
Researcher Affiliation	Collaboration	1The University of Manchester, 2The Alan Turing Institute, 3Deep Mind {flennerhag,andreirusu,razp,visin,raia}@google.com hujun.yin@manchester.ac.uk
Pseudocode	Yes	Algorithm 1 Warp Grad: online meta-training
Open Source Code	Yes	Open-source implementation available at https://github.com/flennerhag/warpgrad.
Open Datasets	Yes	mini Image Net This dataset is a subset of 100 classes sampled randomly from the 1000 base classes in the ILSVRC-12 training set... tiered Image Net As described in (Ren et al., 2018)... Omniglot (Lake et al., 2011)
Dataset Splits	Yes	classes are split into non-overlapping meta-training, meta-validation and meta-tests sets with 64, 16, and 20 classes in each respectively.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	The paper does not specify the version numbers for any software dependencies or libraries used for the experiments.
Experiment Setup	Yes	Hyper-parameters were tuned independently for each condition using random grid search (including ﬁlter sizes full experimental settings in Appendix H), and we report best results from our experiments or the literature. ... 60000 metatraining steps were performed using meta-gradients over a single randomly selected task instances and their entire trajectories of 5 adaptation steps. Task-speciﬁc adaptation was done using stochastic gradient descent without momentum. We use Adam (Kingma & Ba, 2015) for meta-updates.