Batch Normalization Orthogonalizes Representations in Deep Random Networks
Authors: Hadi Daneshmand, Amir Joudaki, Francis Bach
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments presented in Fig. 2a validate the exponential decay rate of V with depth. In this plot, we see that log(Vℓ) linearly decreases for ℓ= 1, . . . , 20, then it wiggles around a small constant. Our experiments in Fig. 2b suggest that the O(1/ d) dependency on width is almost tight. |
| Researcher Affiliation | Academia | Hadi Daneshmand INRIA Paris seyed.daneshmand@inria.fr Amir Joudaki ETH Zurich amir.joudaki@inf.ethz.ch Francis Bach INRIA-ENS-PSL Paris francis.bach@inria.fr |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Implementations are available at https://github.com/hadidaneshmand/batchnorm21.git |
| Open Datasets | Yes | The learning task is classification with cross entropy loss for CIFAR10 dataset (Krizhevsky et al., 2009, MIT license). |
| Dataset Splits | No | The paper mentions using the CIFAR10 dataset and a batch size for training, but does not specify exact training/validation/test splits or provide citations to predefined splits for reproduction. |
| Hardware Specification | Yes | We use Py Torch (Paszke et al., 2019, BSD license) and Google Colaboratory platform with a single Tesla-P100 GPU with 16GB memory in all the experiments. |
| Software Dependencies | No | The paper mentions |
| Experiment Setup | Yes | Throughout the experiments, we use vanilla MLP (without BN) with a width of 800 across all hidden layers, Re LU activation, and used Xavier s method for weights intialization (Glorot and Bengio, 2010). We use SGD with stepsize 0.01 and batch size 500 and for training. |