Sharp-MAML: Sharpness-Aware Model-Agnostic Meta Learning

Authors: Momin Abbas, Quan Xiao, Lisha Chen, Pin-Yu Chen, Tianyi Chen

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically demonstrate that Sharp-MAML and its computation-efficient variant can outperform the plain-vanilla MAML baseline (e.g., +3% accuracy on Mini-Imagenet). We complement the empirical study with the convergence rate analysis and the generalization bound of Sharp-MAML. To the best of our knowledge, this is the first empirical and theoretical study on sharpness-aware minimization in the context of bilevel learning.
Researcher Affiliation Collaboration 1Rensselaer Polytechnic Institute, Troy, NY 2IBM Thomas J. Watson Research Center, NY, USA.
Pseudocode Yes Algorithm 1 Pseudo-code for Sharp-MAMLboth; red lines need to be modified for Sharp-MAMLup; blue lines need to be modified for Sharp-MAMLlow
Open Source Code Yes The code is available at https://github.com/mominabbass/Sharp-MAML.
Open Datasets Yes We evaluate Sharp-MAML on 5-way 1-shot and 5-way 5-shot settings on the Mini-Imagenet dataset and present the results on Omniglot dataset in Appendix E.
Dataset Splits No The paper mentions a "separate validation set D m = n i=1{(xi, yi)}" in Section 2.1 as part of the MAML problem formulation, but does not provide specific details on the validation split percentages or sample counts for the experiments conducted.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU models, CPU types) used to run the experiments.
Software Dependencies No The paper mentions using "Adam as the base optimizer" and "the open-source SAM PyTorch implementation" but does not specify version numbers for PyTorch or other libraries. It also mentions "reproduced using the Torchmeta (Deleu et al., 2019) library" without a version.
Experiment Setup Yes The models were trained using the SAM1 algorithm with Adam as the base optimizer and learning rate α = 0.001. ... The values of αlow, αup are taken from a set of {0.05, 0.005, 0.0005, 0.00005} and each experiment is run on each value for three random seeds. We choose the inner gradient steps from a set of {3, 5, 7, 10}. The step size is chosen from a set of {0.1, 0.01, 0.001}. For Sharp-MAMLboth we use the same value of αlow, αup in each experiment. ... We use only one inner gradient step with 0.1 learning rate for all our experiments for training and testing. The batch size was set to 16 for the 20-way learning setting.