reproducibilityindex.ai

Understanding Batch Normalization

Authors: Nils Bjorck, Carla P. Gomes, Bart Selman, Kilian Q. Weinberger

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct several experiments, and show that BN primarily enables training with larger learning rates, which is the cause for faster convergence and better generalization. (Abstract) and To investigate batch normalization we will use an experimental setup similar to the original Resnet paper [17]: image classiﬁcation on CIFAR10 [27] with a 110 layer Resnet. (Section 1.2)
Researcher Affiliation	Academia	Johan Bjorck, Carla Gomes, Bart Selman, Kilian Q. Weinberger Cornell University {njb225,gomes,selman,kqw4} @cornell.edu
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper refers to an online version on arXiv ([4]) for further details, which typically contains the paper itself, not source code. There is no explicit statement about releasing code or a direct link to a code repository for the methodology.
Open Datasets	Yes	To investigate batch normalization we will use an experimental setup similar to the original Resnet paper [17]: image classiﬁcation on CIFAR10 [27]
Dataset Splits	No	The paper mentions training on CIFAR10 and using various initial learning rates, but does not explicitly state the use of a validation set or describe validation splits.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions general techniques and tools like "SGD" and "data augmentation" but does not specify any software dependencies with version numbers (e.g., library names with specific versions).
Experiment Setup	Yes	We use SGD with momentum and weight decay, employ standard data augmentation and image preprocessing techniques and decrease learning rate when learning plateaus, all as in [17] and with the same parameter values. The original network can be trained with initial learning rate 0.1 over 165 epochs, however which fails without BN. We always report the best results among initial learning rates from {0.1, 0.003, 0.001, 0.0003, 0.0001, 0.00003} and use enough epochs such that learning plateaus.