Meta Dropout: Learning to Perturb Latent Features for Generalization
Authors: Hae Beom Lee, Taewook Nam, Eunho Yang, Sung Ju Hwang
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our method on few-shot classification datasets, whose results show that it significantly improves the generalization performance of the base model, and largely outperforms existing regularization methods such as information bottleneck, manifold mixup, and information dropout. |
| Researcher Affiliation | Collaboration | Hae Beom Lee1, Taewook Nam1, Eunho Yang1,2, Sung Ju Hwang1,2 KAIST1,AITRICS2, South Korea {haebeom.lee,namsan,eunhoy,sjhwang82}@kaist.ac.kr |
| Pseudocode | Yes | Algorithm 1 Meta-training and Algorithm 2 Meta-testing (Appendix A) |
| Open Source Code | No | The paper states "We used Tensor Flow (Abadi et al., 2016) for all our implementations." but does not provide a link or explicit statement about the availability of its own source code. |
| Open Datasets | Yes | We validate our method on the following two benchmark datasets for few-shot classification. 1) Omniglot: This gray-scale hand-written character dataset consists of 1623 classes with 20 examples of size 28 28 for each class. Following the experimental setup of Vinyals et al. (2016), we use 1200 classes for meta-training, and the remaining 423 classes for meta-testing. ... 2) mini Image Net: This is a subset of ILSVRC-2012 (Deng et al., 2009), consisting of 100 classes with 600 examples of size 84 84 per each class. |
| Dataset Splits | Yes | Omniglot: Following the experimental setup of Vinyals et al. (2016), we use 1200 classes for meta-training, and the remaining 423 classes for meta-testing. ... mimi Image Net: There are 64, 16 and 20 classes for metatrain/validation/test respectively. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU or CPU models used for the experiments. |
| Software Dependencies | No | The paper states "We used Tensor Flow (Abadi et al., 2016) for all our implementations." but does not specify the version number of Tensor Flow or any other software dependencies. |
| Experiment Setup | Yes | Omniglot: For 1-shot classification, we use the meta-batchsize of B = 8 and the inner-gradient stepsize of α = 0.1. For 5-shot classification, we use B = 6 and α = 0.4. We train for total 40K iterations with meta-learning rate 10 3. mimi Image Net: We use B = 4 and α = 0.01. We train for 60K iterations with meta-learning rate 10 4. Both datasets: Each inneroptimization consists of 5 SGD steps for both meta-training and meta-testing. ... We use Adam optimizer (Kingma & Ba, 2014) with gradient clipping of [ 3, 3]. |