Meta Dropout: Learning to Perturb Latent Features for Generalization

Authors: Hae Beom Lee, Taewook Nam, Eunho Yang, Sung Ju Hwang

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our method on few-shot classification datasets, whose results show that it significantly improves the generalization performance of the base model, and largely outperforms existing regularization methods such as information bottleneck, manifold mixup, and information dropout.
Researcher Affiliation Collaboration Hae Beom Lee1, Taewook Nam1, Eunho Yang1,2, Sung Ju Hwang1,2 KAIST1,AITRICS2, South Korea {haebeom.lee,namsan,eunhoy,sjhwang82}@kaist.ac.kr
Pseudocode Yes Algorithm 1 Meta-training and Algorithm 2 Meta-testing (Appendix A)
Open Source Code No The paper states "We used Tensor Flow (Abadi et al., 2016) for all our implementations." but does not provide a link or explicit statement about the availability of its own source code.
Open Datasets Yes We validate our method on the following two benchmark datasets for few-shot classification. 1) Omniglot: This gray-scale hand-written character dataset consists of 1623 classes with 20 examples of size 28 28 for each class. Following the experimental setup of Vinyals et al. (2016), we use 1200 classes for meta-training, and the remaining 423 classes for meta-testing. ... 2) mini Image Net: This is a subset of ILSVRC-2012 (Deng et al., 2009), consisting of 100 classes with 600 examples of size 84 84 per each class.
Dataset Splits Yes Omniglot: Following the experimental setup of Vinyals et al. (2016), we use 1200 classes for meta-training, and the remaining 423 classes for meta-testing. ... mimi Image Net: There are 64, 16 and 20 classes for metatrain/validation/test respectively.
Hardware Specification No The paper does not provide any specific hardware details such as GPU or CPU models used for the experiments.
Software Dependencies No The paper states "We used Tensor Flow (Abadi et al., 2016) for all our implementations." but does not specify the version number of Tensor Flow or any other software dependencies.
Experiment Setup Yes Omniglot: For 1-shot classification, we use the meta-batchsize of B = 8 and the inner-gradient stepsize of α = 0.1. For 5-shot classification, we use B = 6 and α = 0.4. We train for total 40K iterations with meta-learning rate 10 3. mimi Image Net: We use B = 4 and α = 0.01. We train for 60K iterations with meta-learning rate 10 4. Both datasets: Each inneroptimization consists of 5 SGD steps for both meta-training and meta-testing. ... We use Adam optimizer (Kingma & Ba, 2014) with gradient clipping of [ 3, 3].