MAML is a Noisy Contrastive Learner in Classification

Authors: Chia Hsiang Kao, Wei-Chen Chiu, Pin-Yu Chen

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments are conducted on both mini-Image Net and Omniglot datasets to validate the consistent improvement brought by our proposed method. In this section, we provide empirical evidence of the supervised contrastiveness of MAML and show that zero-initialization of w0, reduction in the initial norm of w0, or the application of zeroing trick can speed up the learning profile.
Researcher Affiliation Collaboration National Yang Ming Chiao Tung University, Taiwan IBM Research chkao.md04@nycu.edu.tw walon@cs.nctu.edu.tw pin-yu.chen@ibm.com
Pseudocode Yes A ORIGINAL MAML AND MAML WITH THE ZEROING TRICK Algorithm 1 Second-order MAML Algorithm 2 First-order MAML Algorithm 3 Second-order MAML with the zeroing trick
Open Source Code Yes Code available at https://github.com/Iand Rover/MAML_noisy_contrasive_learner
Open Datasets Yes We conduct our experiments on the mini-Image Net dataset (Vinyals et al., 2016; Ravi & Larochelle, 2017) and the Omniglot dataset (Lake et al., 2015).
Dataset Splits Yes For the mini-Image Net, it contains 84 84 RGB images of 100 classes from the Image Net dataset with 600 samples per class. We split the dataset into 64, 16 and 20 classes for training, validation, and testing as proposed in (Ravi & Larochelle, 2017). For Omniglot... The dataset set is splitted into training (1028 classes), validation (172 classes) and testing (423 classes) sets (Vinyals et al., 2016).
Hardware Specification Yes Each experiment is run on either a single NVIDIA 1080-Ti or V100 GPU.
Software Dependencies No The paper mentions that implementation is based on other works (Long (2018) and Deleu (2020)) but does not specify software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes The models are trained with the softmax cross entropy loss function using the Adam optimizer with an outer loop learning rate of 0.001 (Antoniou et al., 2019). The inner loop step size η is set to 0.01. The models are trained for 30000 iterations (Raghu et al., 2020). The inner loop learning rate η is 0.4. The models are trained for 3000 iterations using FOMAML or SOMAML.