reproducibilityindex.ai

AutoDropout: Learning Dropout Patterns to Regularize Deep Networks

Authors: Hieu Pham, Quoc Le9351-9359

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that this method works well for both image recognition on CIFAR-10 and Image Net, as well as language modeling on Penn Treebank and Wiki Text-2. The learned dropout patterns also transfers to different tasks and datasets, such as from language model on Penn Treebank to Engligh-French translation on WMT 2014. Our experiments show that Auto Dropout can ﬁnd dropout patterns that signiﬁcantly improve commonly-used Conv Net and Transformer architectures. On Image Net, Auto Dropout improves the top-1 accuracy of Res Net-50 from 76.5% to 78.7%, and Efﬁcient Net-B7 from 84.1% to 84.7%.
Researcher Affiliation	Industry	Hieu Pham and Quoc V. Le Google Research Brain Team, Mountain View, CA 94043 {hyhieu,qvl}@google.com
Pseudocode	No	The paper describes the controller model and search algorithm, but does not provide explicit pseudocode blocks or algorithms labeled as such.
Open Source Code	Yes	Our code will be available at: https://github.com/googleresearch/google-research/tree/master/auto_dropout.
Open Datasets	Yes	For CIFAR-10, we use Wide Res Net 28-10 (WRN-28-10; Zagoruyko and Komodakis (2016)) because it is a common baseline on this dataset. For Image Net, we consider Res Net-50 (He et al. 2016) because it is a common architecture for Image Net. ... on the Penn Treebank dataset (PTB; Marcus et al. (1994)). ... language modeling on Wiki Text-2 (Merity et al. 2017).
Dataset Splits	Yes	For CIFAR-10, we search with a WRN-28-2 on the entire dataset, reserving 10% of the original training set for validation. For Image Net, ... We use 80,000 examples for training and 5,000 examples for validation. ... PTB is a small dataset, with about 929K training tokens, 73K validation tokens, and 82K test tokens
Hardware Specification	Yes	On 4 TPU v2 chips, each of our runs takes about 40 minutes.
Software Dependencies	No	The paper refers to deep learning frameworks like TensorFlow and PyTorch but does not provide specific version numbers for ancillary software dependencies used in its experiments.
Experiment Setup	Yes	We train every conﬁguration from scratch for 160,000 steps, using a batch size of 16 and a segment length of 70. We use the cosine learning rate schedule so that each trial converges to a reasonable perplexity. ... We train each dropout pattern on CIFAR10 for 32,000 steps, and train each pattern on Image Net for 16,000 steps.