A Simple Yet Effective Strategy to Robustify the Meta Learning Paradigm

Authors: Qi Wang, Yiqin Lv, yanghe feng, Zheng Xie, Jincai Huang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 Experimental Results and Analysis This section presents experimental results and examines fast adaptation performance in a distributional sense. Without loss of generality, we take DR-MAML in Example (1) to run experiments. Benchmarks. The same as work in (Collins et al., 2020), we use two commonly-seen downstream tasks for meta learning experiments: few-shot regression and image classification. Besides, ablation studies are included to assess other factors influence or the proposed strategy s scalability. Baselines & Evaluations. Since the primary investigation is regarding risk minimization principles, we consider the previously mentioned expected risk minimization, worst-case minimization, and expected tail risk minimization for meta learning. Hence, MAML (empirical risk), TR-MAML (worst-case risk), and DR-MAML (expected tail risk) serve as examined methods. We evaluate these methods performance based on the Average, Worst-case, and CVa Rα metrics. For the confidence level to meta train DR-MAML, we empirically set α = 0.7 for few-shot regression tasks and α = 0.5 image classification tasks without external configurations.
Researcher Affiliation Academia Qi Wang1 Yiqin Lv1 Yanghe Feng2 Zheng Xie1 Jincai Huang2 1College of Science, National University of Defense Technology 2College of Systems Engineering, National University of Defense Technology {wangqi15,lvyiqin98,fengyanghe,xiezheng81,huangjincai}@nudt.edu.cn
Pseudocode Yes B Pseudo Algorithms of DR-MAML & DR-CNPs Algorithm 1: DR-MAML Input :Task distribution p(τ); Confidence level α; Task batch size B; Learning rates: λ1 and λ2. Output :Meta-trained model parameter ϑ. Randomly initialize the model parameter ϑ; while not converged do Sample a batch of tasks {τi}B i=1 p(τ); // inner loop via gradient descent for i = 1 to B do Evaluate the gradient: ϑℓ(DC τi; ϑ) in Eq. (7); Perform task-specific gradient updates: ϑi ϑ λ1 ϑℓ(DC τi; ϑ); end // estimate Va Rα[ℓ(T , ϑ)] ˆξα Evaluate performance LB = {ℓ(DT τi; ϑi)}B i=1; Estimate Va Rα[ℓ(T , ϑ)] and set ξ = ˆξα in Eq. (7) with either percentile rank or density estimators; // outer loop via gradient descent Screen the subset L ˆ B = {ℓ(DT ˆτi; ϑi)}K i=1 with ˆξα for meta initialization updates; ϑ ϑ λ2 ϑ PK i=1 ℓ(DT ˆτi; ϑi) in Eq. (7); end
Open Source Code No The paper mentions modifying existing codebases but does not provide a link to their own open-source code for the described methodology. It states: "And ours is built on top of the above codes except for simple modification of loss functions."
Open Datasets Yes The Omniglot dataset consists of 1623 handwritten characters from 50 alphabets, with each 20 examples. The task distribution is uniform for all task instances consisting of characters from one specific alphabet. The dataset split follows procedures in (Triantafillou et al., 2019). Finally, 25 alphabets are used for meta-training, with 20 alphabets for meta-testing. The number of task batches is 16. The confidence level α = 0.5 is selected with the same criteria as that in few-shot regression tasks. The maximum number of iterations is 60000 in meta-training. As the construction of the Omniglot meta dataset is related to specific alphabets and the scale of combination for tasks is huge, this indicates randomly sampled meta-training tasks in the evaluation of the main paper Tables may not be used in meta-training. The mini-Image Net dataset is pre-processed according to (Larochelle, 2017). In detail, 64 classes are used for meta-training, with the remaining 36 classes for meta-testing.
Dataset Splits No The paper mentions meta-training and meta-testing datasets but does not explicitly describe a separate validation split with specific percentages or counts. For example, in the Omniglot dataset section, it states: "Finally, 25 alphabets are used for meta-training, with 20 alphabets for meta-testing."
Hardware Specification Yes In this research project, we use NVIDIA 1080-Ti GPUs in computation.
Software Dependencies Yes Pytorch (Paszke et al., 2019) works as the deep learning toolkit in implementing few-shot image classification experiments. Meanwhile, Tensorflow is the deep learning toolkit for implementing sinusoid few-shot regression experiments.
Experiment Setup Yes For the confidence level to meta train DR-MAML, we empirically set α = 0.7 for few-shot regression tasks and α = 0.5 image classification tasks without external configurations. ... The number of task batches is 50 for 5-shot and 25 for 10-shot. ... The maximum number of iterations in meta-training is 70000. ... The number of task batches is 16. ... The maximum number of iterations is 60000 in meta-training. ... The number of task batches is 4. The maximum number of iterations is 60000 in meta-training. ... All methods use one stochastic gradient descent step as the inner loop.