reproducibilityindex.ai

Improved Dropout for Shallow and Deep Learning

Authors: Zhe Li, Boqing Gong, Tianbao Yang

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical studies on several benchmark datasets demonstrate that the proposed dropouts achieve not only much faster convergence and but also a smaller testing error than the standard dropout.
Researcher Affiliation	Academia	Zhe Li1, Boqing Gong2, Tianbao Yang1 1The University of Iowa, Iowa city, IA 52245 2University of Central Florida, Orlando, FL 32816 {zhe-li-1,tianbao-yang}@uiowa.edu bgong@crcv.ucf.edu
Pseudocode	Yes	Figure 1: Evolutional Dropout applied to a layer over a mini-batch
Open Source Code	No	The paper mentions using external libraries like 'cuda-convnet library' and 'Caffe' (with links to their general GitHub repositories) for conducting experiments, but does not state that the source code for their own proposed methodology is publicly released.
Open Datasets	Yes	We use the three data sets: real-sim, news20 and RCV13. 3https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/ We conduct experiments on four benchmark data sets for comparing e-dropout and s-dropout: MNIST [10], SVHN [11], CIFAR-10 and CIFAR-100 [8].
Dataset Splits	Yes	We use the default splitting of training and testing data in all data sets. We directly optimize the neural networks using all training images without further splitting it into a validation data to be added into the training in later stages, which explains some marginal gaps from the literature results that we observed (e.g., on CIFAR-10 compared with [19]).
Hardware Specification	No	The paper mentions 'All the experiments are conducted using the cuda-convnet library 4.' which implies GPU usage, but no specific hardware details such as GPU models, CPU types, or memory specifications are provided.
Software Dependencies	No	The paper mentions 'All the experiments are conducted using the cuda-convnet library 4.' and 'For batch normalization, we use the implementation in Caffe 5.', but it does not specify exact version numbers for these or other software dependencies.
Experiment Setup	Yes	In all experiments, we set δ = 0.5 in the standard dropout and k = 0.5d in the proposed dropouts for fair comparison, where d represents the number of features or neurons of the layer that dropout is applied to. The rectiﬁed linear activation function is used for all neurons. All the experiments are conducted using the cuda-convnet library 4. The training procedure is similar to [9] using mini-batch SGD with momentum (0.9). The size of mini-batch is ﬁxed to 128. The weights are initialized based on the Gaussian distribution with mean zero and standard deviation 0.01. The learning rate (i.e., step size) is decreased after a number of epochs similar to what was done in previous works [9]. We tune the initial learning rates for s-dropout and e-dropout separately from 0.001, 0.005, 0.01, 0.1 and report the best result on each data set that yields the fastest convergence.