Unraveling Meta-Learning: Understanding Feature Representations for Few-Shot Tasks

Authors: Micah Goldblum, Steven Reich, Liam Fowl, Renkun Ni, Valeriia Cherepanova, Tom Goldstein

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We develop a better understanding of the underlying mechanics of meta-learning and the difference between models trained using meta-learning and models which are trained classically. In doing so, we introduce and verify several hypotheses for why meta-learned models perform better. Furthermore, we develop a regularizer which boosts the performance of standard training routines for few-shot classification. In many cases, our routine outperforms metalearning while simultaneously running an order of magnitude faster. In Table 1, we test the performance of meta-learned feature extractors not only with their own fine-tuning algorithm, but with a variety of fine-tuning algorithms. We find that in all cases, the meta-learned feature extractors outperform classically trained models of the same architecture.
Researcher Affiliation Academia 1University of Maryland, College Park.
Pseudocode Yes Algorithm 1 The meta-learning framework. Algorithm 2 Reptile with Weight-Clustering Regularization.
Open Source Code Yes A Py Torch implementation of the feature clustering and hyperplane variation regularizers can be found at: https://github.com/goldblum/FeatureClustering
Open Datasets Yes We focus our attention on two datasets: mini-Image Net and CIFAR-FS. Mini-Image Net is a pruned and downsized version of the Image Net classification dataset, consisting of 60,000, 84 84 RGB color images from 100 classes (Vinyals et al., 2016). The CIFAR-FS dataset samples images from CIFAR-100 (Bertinetto et al., 2018).
Dataset Splits Yes These 100 classes are split into 64, 16, and 20 classes for training, validation, and testing sets, respectively. CIFAR-FS is split in the same way as mini-Image Net with 60,000 32 32 RGB color images from 100 classes divided into 64, 16, and 20 classes for training, validation, and testing sets, respectively.
Hardware Specification No The paper does not provide specific details on the hardware (e.g., CPU, GPU models, or memory) used for running the experiments.
Software Dependencies No The paper mentions 'A Py Torch implementation' but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes We incorporate this regularizer into a standard training routine by sampling two images per class in each mini-batch so that we can compute a within-class variance estimate. Then, the total loss function becomes the sum of cross-entropy and RF C. See Appendix A.2 for experimental details including training times. Experimental details, as well as results for other values of this coefficient, can be found in Appendix A.3.