Improved Dropout for Shallow and Deep Learning

Authors: Zhe Li, Boqing Gong, Tianbao Yang

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical studies on several benchmark datasets demonstrate that the proposed dropouts achieve not only much faster convergence and but also a smaller testing error than the standard dropout.
Researcher Affiliation Academia Zhe Li1, Boqing Gong2, Tianbao Yang1 1The University of Iowa, Iowa city, IA 52245 2University of Central Florida, Orlando, FL 32816 {zhe-li-1,tianbao-yang}@uiowa.edu bgong@crcv.ucf.edu
Pseudocode Yes Figure 1: Evolutional Dropout applied to a layer over a mini-batch
Open Source Code No The paper mentions using external libraries like 'cuda-convnet library' and 'Caffe' (with links to their general GitHub repositories) for conducting experiments, but does not state that the source code for their own proposed methodology is publicly released.
Open Datasets Yes We use the three data sets: real-sim, news20 and RCV13. 3https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/ We conduct experiments on four benchmark data sets for comparing e-dropout and s-dropout: MNIST [10], SVHN [11], CIFAR-10 and CIFAR-100 [8].
Dataset Splits Yes We use the default splitting of training and testing data in all data sets. We directly optimize the neural networks using all training images without further splitting it into a validation data to be added into the training in later stages, which explains some marginal gaps from the literature results that we observed (e.g., on CIFAR-10 compared with [19]).
Hardware Specification No The paper mentions 'All the experiments are conducted using the cuda-convnet library 4.' which implies GPU usage, but no specific hardware details such as GPU models, CPU types, or memory specifications are provided.
Software Dependencies No The paper mentions 'All the experiments are conducted using the cuda-convnet library 4.' and 'For batch normalization, we use the implementation in Caffe 5.', but it does not specify exact version numbers for these or other software dependencies.
Experiment Setup Yes In all experiments, we set δ = 0.5 in the standard dropout and k = 0.5d in the proposed dropouts for fair comparison, where d represents the number of features or neurons of the layer that dropout is applied to. The rectified linear activation function is used for all neurons. All the experiments are conducted using the cuda-convnet library 4. The training procedure is similar to [9] using mini-batch SGD with momentum (0.9). The size of mini-batch is fixed to 128. The weights are initialized based on the Gaussian distribution with mean zero and standard deviation 0.01. The learning rate (i.e., step size) is decreased after a number of epochs similar to what was done in previous works [9]. We tune the initial learning rates for s-dropout and e-dropout separately from 0.001, 0.005, 0.01, 0.1 and report the best result on each data set that yields the fastest convergence.