AutoDropout: Learning Dropout Patterns to Regularize Deep Networks

Authors: Hieu Pham, Quoc Le9351-9359

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that this method works well for both image recognition on CIFAR-10 and Image Net, as well as language modeling on Penn Treebank and Wiki Text-2. The learned dropout patterns also transfers to different tasks and datasets, such as from language model on Penn Treebank to Engligh-French translation on WMT 2014. Our experiments show that Auto Dropout can find dropout patterns that significantly improve commonly-used Conv Net and Transformer architectures. On Image Net, Auto Dropout improves the top-1 accuracy of Res Net-50 from 76.5% to 78.7%, and Efficient Net-B7 from 84.1% to 84.7%.
Researcher Affiliation Industry Hieu Pham and Quoc V. Le Google Research Brain Team, Mountain View, CA 94043 {hyhieu,qvl}@google.com
Pseudocode No The paper describes the controller model and search algorithm, but does not provide explicit pseudocode blocks or algorithms labeled as such.
Open Source Code Yes Our code will be available at: https://github.com/googleresearch/google-research/tree/master/auto_dropout.
Open Datasets Yes For CIFAR-10, we use Wide Res Net 28-10 (WRN-28-10; Zagoruyko and Komodakis (2016)) because it is a common baseline on this dataset. For Image Net, we consider Res Net-50 (He et al. 2016) because it is a common architecture for Image Net. ... on the Penn Treebank dataset (PTB; Marcus et al. (1994)). ... language modeling on Wiki Text-2 (Merity et al. 2017).
Dataset Splits Yes For CIFAR-10, we search with a WRN-28-2 on the entire dataset, reserving 10% of the original training set for validation. For Image Net, ... We use 80,000 examples for training and 5,000 examples for validation. ... PTB is a small dataset, with about 929K training tokens, 73K validation tokens, and 82K test tokens
Hardware Specification Yes On 4 TPU v2 chips, each of our runs takes about 40 minutes.
Software Dependencies No The paper refers to deep learning frameworks like TensorFlow and PyTorch but does not provide specific version numbers for ancillary software dependencies used in its experiments.
Experiment Setup Yes We train every configuration from scratch for 160,000 steps, using a batch size of 16 and a segment length of 70. We use the cosine learning rate schedule so that each trial converges to a reasonable perplexity. ... We train each dropout pattern on CIFAR10 for 32,000 steps, and train each pattern on Image Net for 16,000 steps.