Meta-Learning with Warped Gradient Descent
Authors: Sebastian Flennerhag, Andrei A. Rusu, Razvan Pascanu, Francesco Visin, Hujun Yin, Raia Hadsell
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Warp Grad in a set of experiments designed to answer three questions: (1) do Warp Grad methods retain the inductive bias of MAML-based fewshot learners? (2) Can Warp Grad methods scale to problems beyond the reach of such methods? (3) Can Warp Grad generalise to complex meta-learning problems? ... Warp MAML outperforms all baselines (Table 1), improving 1and 5-shot accuracy by 3.6 and 5.5 percentage points on mini Image Net (Vinyals et al., 2016; Ravi & Larochelle, 2016) and by 5.2 and 3.8 percentage points on tiered Image Net (Ren et al., 2018), which indicates that Warp Grad retains the inductive bias of MAML-based meta-learners. |
| Researcher Affiliation | Collaboration | 1The University of Manchester, 2The Alan Turing Institute, 3Deep Mind {flennerhag,andreirusu,razp,visin,raia}@google.com hujun.yin@manchester.ac.uk |
| Pseudocode | Yes | Algorithm 1 Warp Grad: online meta-training |
| Open Source Code | Yes | Open-source implementation available at https://github.com/flennerhag/warpgrad. |
| Open Datasets | Yes | mini Image Net This dataset is a subset of 100 classes sampled randomly from the 1000 base classes in the ILSVRC-12 training set... tiered Image Net As described in (Ren et al., 2018)... Omniglot (Lake et al., 2011) |
| Dataset Splits | Yes | classes are split into non-overlapping meta-training, meta-validation and meta-tests sets with 64, 16, and 20 classes in each respectively. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper does not specify the version numbers for any software dependencies or libraries used for the experiments. |
| Experiment Setup | Yes | Hyper-parameters were tuned independently for each condition using random grid search (including filter sizes full experimental settings in Appendix H), and we report best results from our experiments or the literature. ... 60000 metatraining steps were performed using meta-gradients over a single randomly selected task instances and their entire trajectories of 5 adaptation steps. Task-specific adaptation was done using stochastic gradient descent without momentum. We use Adam (Kingma & Ba, 2015) for meta-updates. |