How to Train Your MAML to Excel in Few-Shot Classification
Authors: Han-Jia Ye, Wei-Lun Chao
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | As our paper is heavily driven by empirical observations, we first introduce the three main datasets we experiment on, the neural network architectures we use, and the implementation details. |
| Researcher Affiliation | Academia | Han-Jia Ye State Key Laboratory for Novel Software Technology, Nanjing University Wei-Lun Chao The Ohio State University |
| Pseudocode | Yes | Algorithm 1: Evaluation of the effect of class label permutations on meta-testing tasks. |
| Open Source Code | Yes | Our code is available at https://github.com/Han-Jia/UNICORN-MAML. |
| Open Datasets | Yes | We work on Mini Image Net (Vinyals et al., 2016), Tiered Image Net (Ren et al., 2018a), and CUB datasets (Wah et al., 2011). |
| Dataset Splits | Yes | Mini Image Net contains 100 semantic classes; each has 600 images. Following (Ravi & Larochelle, 2017), the 100 classes are split into meta-training/validation/testing sets with 64/16/20 (non-overlapped) classes, respectively. |
| Hardware Specification | No | The paper acknowledges support from 'Ohio Supercomputer Center and AWS Cloud Credits for Research' but does not specify any particular GPU, CPU, or other hardware models used for the experiments. |
| Software Dependencies | No | The paper mentions optimization techniques like 'SGD with momentum 0.9 and weight decay 0.0005' and learning rates, but it does not list specific software libraries or frameworks with version numbers (e.g., PyTorch 1.x, TensorFlow 2.x). |
| Experiment Setup | Yes | MAML has several hyper-parameters and we select them on the meta-validation set. Specifically, for the outer loop, we learn with at most 10, 000 tasks: we group every 100 tasks into an epoch. We apply SGD with momentum 0.9 and weight decay 0.0005. We start with an outer loop learning rate 0.002 for Conv Net and 0.001 for Res Net-12, which are decayed by 0.5 and 0.1 after every 20 epochs for Conv Net and Res Net-12, respectively. For the inner loop, we have to set the number of gradient steps M and the learning rate α (cf. Equation 1). |