A Closer Look at the Training Strategy for Modern Meta-Learning

Authors: JIAXIN CHEN, Xiao-Ming Wu, Yanke Li, Qimai LI, Li-Ming Zhan, Fu-lai Chung

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This paper conducts a theoretical investigation of this training strategy on generalization. From a stability perspective, we analyze the generalization error bound of generic meta-learning algorithms trained with such strategy. We show that the S/Q episodic training strategy naturally leads to a counterintuitive generalization bound of O(1/ n), which only depends on the task number n but independent of the inner-task sample size m. Under the common assumption m << n for few-shot learning, the bound of O(1/ n) implies strong generalization guarantees for modern meta-learning algorithms in the few-shot regime. To further explore the influence of training strategies on generalization, we propose a leave-one-out (LOO) training strategy for meta-learning and compare it with S/Q training. Experiments on standard few-shot regression and classification tasks with popular meta-learning algorithms validate our analysis.
Researcher Affiliation Academia Jiaxin Chen1, Xiao-Ming Wu1, , Yanke Li2, Qimai Li1, Li-Ming Zhan1, and Fu-lai Chung1, 1Department of Computing, The Hong Kong Polytechnic University 2Department of Mathematics, ETH Zurich {jiax.chen, qee-mai.li, lmzhan.zhan}@connect.polyu.hk, {xiao-ming.wu, korris.chung}@polyu.edu.hk, yankli@student.ethz.ch
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes The source code can be downloaded from https://github.com/jiaxinchen666/meta-theory.
Open Datasets Yes Few-shot classification. We follow the standard experimental setting proposed in [35] using the real-life dataset mini Imagenet. This dataset has 100 classes and is split into a training set of 64 classes, a test set of 20 classes and a validation set of 16 classes.
Dataset Splits Yes This dataset has 100 classes and is split into a training set of 64 classes, a test set of 20 classes and a validation set of 16 classes.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers).
Experiment Setup Yes We implement the meta-algorithms MAML [18] and Bilevel Programming [19] by using a MLP with two hidden layers of size 40 with Re LU activation function. Both the input layer and the output layer have dimensionality 1. ... We implement MAML [18] and Proto Net [31] using the Conv-4 backbone and follow the implementation details in [8]. We set m = 5, q = 1 for regression and m = 1, q = 1 for classification.