reproducibilityindex.ai

Adaptive Dropout with Rademacher Complexity Regularization

Authors: Ke Zhai, Huan Wang

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on the task of image and document classiﬁcation also show our method achieves better performance compared to the state-of-the-art dropout algorithms.
Researcher Affiliation	Industry	Ke Zhai Microsoft AI & Research Sunnyvale, CA kezhai@microsoft.com Huan Wang Salesforce Research Palo Alto, CA joyousprince@gmail.com
Pseudocode	No	The paper describes algorithmic steps and mathematical formulations but does not include a clearly labeled pseudocode block or algorithm.
Open Source Code	No	The paper does not provide any statement about releasing source code or include a link to a code repository for the methodology described.
Open Datasets	Yes	MNIST dataset is a collection of 28 28 pixel hand-written digit images in grayscale, containing 60K for training and 10K for testing.
Dataset Splits	Yes	For all datasets, we hold out 20% of the training data as validation set for parameter tuning and model selection.
Hardware Specification	No	The paper does not specify the hardware used for running the experiments, such as particular CPU or GPU models, or cloud computing instances.
Software Dependencies	No	The paper mentions using standard machine learning concepts and models but does not provide specific version numbers for any software libraries, frameworks, or programming languages used.
Experiment Setup	Yes	We optimize categorical cross-entropy loss on predicted class labels with Rademacher regularization. ... We update the parameters using mini-batch stochastic gradient descent with Nesterov momentum of 0.95. ... For Rademacher complexity term, we perform a grid search on the regularization weight λ {0.05, 0.01, 0.005, 0.001, 1e 4, 1e 5}, and update the dropout rates after every I {1, 5, 10, 50, 100} minibatches. ... We use a learning rate of 0.01 and decay it by 0.5 after every {300, 400, 500} epochs. ... initializing the retaining rates to 0.8 for input layer and 0.5 for hidden layer yields better performance for all models.