Sharp-MAML: Sharpness-Aware Model-Agnostic Meta Learning
Authors: Momin Abbas, Quan Xiao, Lisha Chen, Pin-Yu Chen, Tianyi Chen
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate that Sharp-MAML and its computation-efficient variant can outperform the plain-vanilla MAML baseline (e.g., +3% accuracy on Mini-Imagenet). We complement the empirical study with the convergence rate analysis and the generalization bound of Sharp-MAML. To the best of our knowledge, this is the first empirical and theoretical study on sharpness-aware minimization in the context of bilevel learning. |
| Researcher Affiliation | Collaboration | 1Rensselaer Polytechnic Institute, Troy, NY 2IBM Thomas J. Watson Research Center, NY, USA. |
| Pseudocode | Yes | Algorithm 1 Pseudo-code for Sharp-MAMLboth; red lines need to be modified for Sharp-MAMLup; blue lines need to be modified for Sharp-MAMLlow |
| Open Source Code | Yes | The code is available at https://github.com/mominabbass/Sharp-MAML. |
| Open Datasets | Yes | We evaluate Sharp-MAML on 5-way 1-shot and 5-way 5-shot settings on the Mini-Imagenet dataset and present the results on Omniglot dataset in Appendix E. |
| Dataset Splits | No | The paper mentions a "separate validation set D m = n i=1{(xi, yi)}" in Section 2.1 as part of the MAML problem formulation, but does not provide specific details on the validation split percentages or sample counts for the experiments conducted. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types) used to run the experiments. |
| Software Dependencies | No | The paper mentions using "Adam as the base optimizer" and "the open-source SAM PyTorch implementation" but does not specify version numbers for PyTorch or other libraries. It also mentions "reproduced using the Torchmeta (Deleu et al., 2019) library" without a version. |
| Experiment Setup | Yes | The models were trained using the SAM1 algorithm with Adam as the base optimizer and learning rate α = 0.001. ... The values of αlow, αup are taken from a set of {0.05, 0.005, 0.0005, 0.00005} and each experiment is run on each value for three random seeds. We choose the inner gradient steps from a set of {3, 5, 7, 10}. The step size is chosen from a set of {0.1, 0.01, 0.001}. For Sharp-MAMLboth we use the same value of αlow, αup in each experiment. ... We use only one inner gradient step with 0.1 learning rate for all our experiments for training and testing. The batch size was set to 16 for the 20-way learning setting. |