Batch Normalization Orthogonalizes Representations in Deep Random Networks

Authors: Hadi Daneshmand, Amir Joudaki, Francis Bach

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments presented in Fig. 2a validate the exponential decay rate of V with depth. In this plot, we see that log(Vℓ) linearly decreases for ℓ= 1, . . . , 20, then it wiggles around a small constant. Our experiments in Fig. 2b suggest that the O(1/ d) dependency on width is almost tight.
Researcher Affiliation Academia Hadi Daneshmand INRIA Paris seyed.daneshmand@inria.fr Amir Joudaki ETH Zurich amir.joudaki@inf.ethz.ch Francis Bach INRIA-ENS-PSL Paris francis.bach@inria.fr
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Implementations are available at https://github.com/hadidaneshmand/batchnorm21.git
Open Datasets Yes The learning task is classification with cross entropy loss for CIFAR10 dataset (Krizhevsky et al., 2009, MIT license).
Dataset Splits No The paper mentions using the CIFAR10 dataset and a batch size for training, but does not specify exact training/validation/test splits or provide citations to predefined splits for reproduction.
Hardware Specification Yes We use Py Torch (Paszke et al., 2019, BSD license) and Google Colaboratory platform with a single Tesla-P100 GPU with 16GB memory in all the experiments.
Software Dependencies No The paper mentions
Experiment Setup Yes Throughout the experiments, we use vanilla MLP (without BN) with a width of 800 across all hidden layers, Re LU activation, and used Xavier s method for weights intialization (Glorot and Bengio, 2010). We use SGD with stepsize 0.01 and batch size 500 and for training.