Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Adaptive Dropout with Rademacher Complexity Regularization
Authors: Ke Zhai, Huan Wang
ICLR 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on the task of image and document classification also show our method achieves better performance compared to the state-of-the-art dropout algorithms. |
| Researcher Affiliation | Industry | Ke Zhai Microsoft AI & Research Sunnyvale, CA EMAIL Huan Wang Salesforce Research Palo Alto, CA EMAIL |
| Pseudocode | No | The paper describes algorithmic steps and mathematical formulations but does not include a clearly labeled pseudocode block or algorithm. |
| Open Source Code | No | The paper does not provide any statement about releasing source code or include a link to a code repository for the methodology described. |
| Open Datasets | Yes | MNIST dataset is a collection of 28 28 pixel hand-written digit images in grayscale, containing 60K for training and 10K for testing. |
| Dataset Splits | Yes | For all datasets, we hold out 20% of the training data as validation set for parameter tuning and model selection. |
| Hardware Specification | No | The paper does not specify the hardware used for running the experiments, such as particular CPU or GPU models, or cloud computing instances. |
| Software Dependencies | No | The paper mentions using standard machine learning concepts and models but does not provide specific version numbers for any software libraries, frameworks, or programming languages used. |
| Experiment Setup | Yes | We optimize categorical cross-entropy loss on predicted class labels with Rademacher regularization. ... We update the parameters using mini-batch stochastic gradient descent with Nesterov momentum of 0.95. ... For Rademacher complexity term, we perform a grid search on the regularization weight λ {0.05, 0.01, 0.005, 0.001, 1e 4, 1e 5}, and update the dropout rates after every I {1, 5, 10, 50, 100} minibatches. ... We use a learning rate of 0.01 and decay it by 0.5 after every {300, 400, 500} epochs. ... initializing the retaining rates to 0.8 for input layer and 0.5 for hidden layer yields better performance for all models. |