reproducibilityindex.ai

Dropout with Expectation-linear Regularization

Authors: Xuezhe Ma, Yingkai Gao, Zhiting Hu, Yaoliang Yu, Yuntian Deng, Eduard Hovy

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we ﬁrst formulate dropout as a tractable approximation of a latent variable model... Experimentally, through three benchmark datasets we show that our regularized dropout is not only as simple and efﬁcient as standard dropout but also consistently leads to improved performance.
Researcher Affiliation	Academia	Xuezhe Ma, Yingkai Gao Language Technologies Institute Carnegie Mellon University {xuezhem, yingkaig}@cs.cmu.edu Zhiting Hu, Yaoliang Yu Machine Learning Department Carnegie Mellon University {zhitinghu, yaoliang}@cs.cmu.edu Yuntian Deng School of Engineering and Applied Sciences Harvard University dengyuntian@gmail.com Eduard Hovy Language Technologies Institute Carnegie Mellon University hovy@cmu.edu
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described.
Open Datasets	Yes	Experiments on three image classiﬁcation benchmark datasets demonstrate that reducing the inference gap can indeed improve the performance consistently. ... The MNIST dataset (Le Cun et al., 1998) consists of 70,000 handwritten digit images of size 28 28, where 60,000 images are used for training and the rest for testing. ... The CIFAR-10 and CIFAR-100 datasets (Krizhevsky, 2009) consist of 60,000 color images of size 32 32... 50,000 images are used for training and the rest for testing.
Dataset Splits	Yes	For each data set We held out 10,000 random training images for validation to tune the hyper-parameters, including λ in Eq. (15).
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers.
Experiment Setup	Yes	For all architectures, we used dropout rate p = 0.5 for all hidden layers and p = 0.2 for the input layer. ... Neural network training in all the experiments is performed with mini-batch stochastic gradient descent (SGD) with momentum. We choose an initial learning rate of η0, and the learning rate is updated on each epoch of training as ηt = η0/(1 + ρt), where ρ is the decay rate and t is the number of epoch completed. We run each experiment with 2,000 epochs... Table 3: Hyper-parameters for all experiments.