Improved Dropout for Shallow and Deep Learning
Authors: Zhe Li, Boqing Gong, Tianbao Yang
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical studies on several benchmark datasets demonstrate that the proposed dropouts achieve not only much faster convergence and but also a smaller testing error than the standard dropout. |
| Researcher Affiliation | Academia | Zhe Li1, Boqing Gong2, Tianbao Yang1 1The University of Iowa, Iowa city, IA 52245 2University of Central Florida, Orlando, FL 32816 {zhe-li-1,tianbao-yang}@uiowa.edu bgong@crcv.ucf.edu |
| Pseudocode | Yes | Figure 1: Evolutional Dropout applied to a layer over a mini-batch |
| Open Source Code | No | The paper mentions using external libraries like 'cuda-convnet library' and 'Caffe' (with links to their general GitHub repositories) for conducting experiments, but does not state that the source code for their own proposed methodology is publicly released. |
| Open Datasets | Yes | We use the three data sets: real-sim, news20 and RCV13. 3https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/ We conduct experiments on four benchmark data sets for comparing e-dropout and s-dropout: MNIST [10], SVHN [11], CIFAR-10 and CIFAR-100 [8]. |
| Dataset Splits | Yes | We use the default splitting of training and testing data in all data sets. We directly optimize the neural networks using all training images without further splitting it into a validation data to be added into the training in later stages, which explains some marginal gaps from the literature results that we observed (e.g., on CIFAR-10 compared with [19]). |
| Hardware Specification | No | The paper mentions 'All the experiments are conducted using the cuda-convnet library 4.' which implies GPU usage, but no specific hardware details such as GPU models, CPU types, or memory specifications are provided. |
| Software Dependencies | No | The paper mentions 'All the experiments are conducted using the cuda-convnet library 4.' and 'For batch normalization, we use the implementation in Caffe 5.', but it does not specify exact version numbers for these or other software dependencies. |
| Experiment Setup | Yes | In all experiments, we set δ = 0.5 in the standard dropout and k = 0.5d in the proposed dropouts for fair comparison, where d represents the number of features or neurons of the layer that dropout is applied to. The rectified linear activation function is used for all neurons. All the experiments are conducted using the cuda-convnet library 4. The training procedure is similar to [9] using mini-batch SGD with momentum (0.9). The size of mini-batch is fixed to 128. The weights are initialized based on the Gaussian distribution with mean zero and standard deviation 0.01. The learning rate (i.e., step size) is decreased after a number of epochs similar to what was done in previous works [9]. We tune the initial learning rates for s-dropout and e-dropout separately from 0.001, 0.005, 0.01, 0.1 and report the best result on each data set that yields the fastest convergence. |