Meta-Learning with Implicit Gradients

Authors: Aravind Rajeswaran, Chelsea Finn, Sham M. Kakade, Sergey Levine

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimentally, we show that these benefits of implicit MAML translate into empirical gains on few-shot image recognition benchmarks. In our experimental evaluation, we aim to answer the following questions empirically: (1) Does the i MAML algorithm asymptotically compute the exact meta-gradient? (2) With finite iterations, does i MAML approximate the meta-gradient more accurately compared to MAML? (3) How does the computation and memory requirements of i MAML compare with MAML? (4) Does i MAML lead to better results in realistic meta-learning problems? We have answered (1) (3) through our theoretical analysis, and now attempt to validate it through numerical simulations. For (1) and (2), we will use a simple synthetic example for which we can compute the exact meta-gradient and compare against it (exact-solve error, see definition 3). For (3) and (4), we will use the common few-shot image recognition domains of Omniglot and Mini-Image Net.
Researcher Affiliation Academia Aravind Rajeswaran University of Washington aravraj@cs.washington.edu Chelsea Finn University of California Berkeley cbfinn@cs.stanford.edu Sham M. Kakade University of Washington sham@cs.washington.edu Sergey Levine University of California Berkeley svlevine@eecs.berkeley.edu
Pseudocode Yes Algorithm 1 Implicit Model-Agnostic Meta-Learning (i MAML) and Algorithm 2 Implicit Meta-Gradient Computation
Open Source Code Yes Project page: http://sites.google.com/view/imaml (This project page links to https://github.com/rllab/imaml_code, which hosts the iMAML code.)
Open Datasets Yes To study (3), we turn to the Omniglot dataset [30] which is a popular few-shot image recognition domain. ... Finally, we study empirical performance of i MAML on the Omniglot and Mini-Image Net domains.
Dataset Splits No The paper describes how tasks are sampled from a distribution P(T) and how each task has Dtr i and Dtest i sets. However, it does not specify a global train/validation/test split for the overall Omniglot or Mini-Image Net datasets, only the structure within individual tasks for meta-learning.
Hardware Specification Yes On the other hand, memory for MAML grows linearly in grad steps, reaching the capacity of a 12 GB GPU in approximately 16 steps.
Software Dependencies No The paper mentions "implemented i MAML in Py Torch" but does not specify a version number for PyTorch or any other software dependencies.
Experiment Setup Yes i MAML with gradient descent (GD) uses 16 and 25 steps for 5-way and 20-way tasks respectively. i MAML with Hessian-free uses 5 CG steps to compute the search direction and performs line-search to pick step size. Both versions of i MAML use λ = 2.0 for regularization, and 5 CG steps to compute the task meta-gradient. We used λ = 0.5 and 10 gradient steps in the inner loop. ... 5 CG steps were used to compute the meta-gradient. The Hessian-free version also uses 5 CG steps for the search direction.