reproducibilityindex.ai

A Mean Field Theory of Batch Normalization

Authors: Greg Yang, Jeffrey Pennington, Vinay Rao, Jascha Sohl-Dickstein, Samuel S. Schoenholz

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We develop a mean ﬁeld theory for batch normalization in fully-connected feedforward neural networks. In so doing, we provide a precise characterization of signal propagation and gradient backpropagation in wide batch-normalized networks at initialization. Our theory shows that gradient signals grow exponentially in depth and that these exploding gradients cannot be eliminated by tuning the initial weight variances or by adjusting the nonlinear activation function. Indeed, batch normalization itself is the cause of gradient explosion. As a result, vanilla batch-normalized networks without skip connections are not trainable at large depths for common initialization schemes, a prediction that we verify with a variety of empirical simulations.
Researcher Affiliation	Industry	Microsoft Research AI, Google Brain gregyang@microsoft.com, {jpennin,vinaysrao,jaschasd,schsam}@google.com
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described.
Open Datasets	Yes	In Fig. 3 (a) we consider networks trained using SGD on MNIST where we observe that networks deeper than about 50 layers are untrainable regardless of batch size. ... in (d) we train the networks on CIFAR10.
Dataset Splits	No	The paper does not provide specific dataset split information (e.g., exact percentages or sample counts for training, validation, and test sets) to reproduce the data partitioning.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers (e.g., library names with versions).
Experiment Setup	Yes	Colors show test accuracy for rectiﬁed linear networks with batch normalization and γ = 1, β = 0, ϵ = 10 3, N = 384, and η = 10 5B. (a) trained on MNIST for 10 epochs (b) trained with ﬁxed batch size 1000 and batch statistics computed over sub batches of size B. (c) trained using RMSProp. (d) Trained on CIFAR10 for 50 epochs.