reproducibilityindex.ai

Sharp-MAML: Sharpness-Aware Model-Agnostic Meta Learning

Authors: Momin Abbas, Quan Xiao, Lisha Chen, Pin-Yu Chen, Tianyi Chen

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically demonstrate that Sharp-MAML and its computation-efficient variant can outperform the plain-vanilla MAML baseline (e.g., +3% accuracy on Mini-Imagenet). We complement the empirical study with the convergence rate analysis and the generalization bound of Sharp-MAML. To the best of our knowledge, this is the first empirical and theoretical study on sharpness-aware minimization in the context of bilevel learning.
Researcher Affiliation	Collaboration	1Rensselaer Polytechnic Institute, Troy, NY 2IBM Thomas J. Watson Research Center, NY, USA.
Pseudocode	Yes	Algorithm 1 Pseudo-code for Sharp-MAMLboth; red lines need to be modified for Sharp-MAMLup; blue lines need to be modified for Sharp-MAMLlow
Open Source Code	Yes	The code is available at https://github.com/mominabbass/Sharp-MAML.
Open Datasets	Yes	We evaluate Sharp-MAML on 5-way 1-shot and 5-way 5-shot settings on the Mini-Imagenet dataset and present the results on Omniglot dataset in Appendix E.
Dataset Splits	No	The paper mentions a "separate validation set D m = n i=1{(xi, yi)}" in Section 2.1 as part of the MAML problem formulation, but does not provide specific details on the validation split percentages or sample counts for the experiments conducted.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU models, CPU types) used to run the experiments.
Software Dependencies	No	The paper mentions using "Adam as the base optimizer" and "the open-source SAM PyTorch implementation" but does not specify version numbers for PyTorch or other libraries. It also mentions "reproduced using the Torchmeta (Deleu et al., 2019) library" without a version.
Experiment Setup	Yes	The models were trained using the SAM1 algorithm with Adam as the base optimizer and learning rate α = 0.001. ... The values of αlow, αup are taken from a set of {0.05, 0.005, 0.0005, 0.00005} and each experiment is run on each value for three random seeds. We choose the inner gradient steps from a set of {3, 5, 7, 10}. The step size is chosen from a set of {0.1, 0.01, 0.001}. For Sharp-MAMLboth we use the same value of αlow, αup in each experiment. ... We use only one inner gradient step with 0.1 learning rate for all our experiments for training and testing. The batch size was set to 16 for the 20-way learning setting.