AutoDropout: Learning Dropout Patterns to Regularize Deep Networks
Authors: Hieu Pham, Quoc Le9351-9359
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that this method works well for both image recognition on CIFAR-10 and Image Net, as well as language modeling on Penn Treebank and Wiki Text-2. The learned dropout patterns also transfers to different tasks and datasets, such as from language model on Penn Treebank to Engligh-French translation on WMT 2014. Our experiments show that Auto Dropout can find dropout patterns that significantly improve commonly-used Conv Net and Transformer architectures. On Image Net, Auto Dropout improves the top-1 accuracy of Res Net-50 from 76.5% to 78.7%, and Efficient Net-B7 from 84.1% to 84.7%. |
| Researcher Affiliation | Industry | Hieu Pham and Quoc V. Le Google Research Brain Team, Mountain View, CA 94043 {hyhieu,qvl}@google.com |
| Pseudocode | No | The paper describes the controller model and search algorithm, but does not provide explicit pseudocode blocks or algorithms labeled as such. |
| Open Source Code | Yes | Our code will be available at: https://github.com/googleresearch/google-research/tree/master/auto_dropout. |
| Open Datasets | Yes | For CIFAR-10, we use Wide Res Net 28-10 (WRN-28-10; Zagoruyko and Komodakis (2016)) because it is a common baseline on this dataset. For Image Net, we consider Res Net-50 (He et al. 2016) because it is a common architecture for Image Net. ... on the Penn Treebank dataset (PTB; Marcus et al. (1994)). ... language modeling on Wiki Text-2 (Merity et al. 2017). |
| Dataset Splits | Yes | For CIFAR-10, we search with a WRN-28-2 on the entire dataset, reserving 10% of the original training set for validation. For Image Net, ... We use 80,000 examples for training and 5,000 examples for validation. ... PTB is a small dataset, with about 929K training tokens, 73K validation tokens, and 82K test tokens |
| Hardware Specification | Yes | On 4 TPU v2 chips, each of our runs takes about 40 minutes. |
| Software Dependencies | No | The paper refers to deep learning frameworks like TensorFlow and PyTorch but does not provide specific version numbers for ancillary software dependencies used in its experiments. |
| Experiment Setup | Yes | We train every configuration from scratch for 160,000 steps, using a batch size of 16 and a segment length of 70. We use the cosine learning rate schedule so that each trial converges to a reasonable perplexity. ... We train each dropout pattern on CIFAR10 for 32,000 steps, and train each pattern on Image Net for 16,000 steps. |