Online Normalization for Training Neural Networks

Authors: Vitaliy Chiley, Ilya Sharapov, Atli Kosson, Urs Koster, Ryan Reece, Sofia Samaniego de la Fuente, Vishal Subbiah, Michael James

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present formal proofs and experimental results on Image Net, CIFAR, and PTB datasets.
Researcher Affiliation Industry Vitaliy Chiley Ilya Sharapov Atli Kosson Urs Koster Ryan Reece Sofía Samaniego de la Fuente Vishal Subbiah Michael James Cerebras Systems 175 S. San Antonio Road Los Altos, California 94022
Pseudocode Yes To define Online Normalization (Figure 6), we replace arithmetic averages over the full dataset in (2) with exponentially decaying averages of online samples. Similarly, projections in (4) and (5) are computed over online data using exponentially decaying inner products. The decay factors αf and αb for forward and backward passes respectively are hyperparameters for the technique.
Open Source Code Yes Scripts to reproduce our results are in the companion repository [3]. [3] Vitaliy Chiley, Michael James, and Ilya Sharapov. Online Normalization reference implementation. https://github.com/cerebras/online-normalization, 2019.
Open Datasets Yes We present formal proofs and experimental results on Image Net, CIFAR, and PTB datasets.
Dataset Splits Yes Online Normalization had the best validation performance of all compared methods.
Hardware Specification No The paper mentions training on 'a single GPU' for CIFAR experiments but does not provide specific details such as GPU model, CPU model, memory, or cloud instance types used for any of the experiments.
Software Dependencies No The paper mentions providing 'reference code in Py Torch, Tensor Flow, and C' but does not specify version numbers for these software dependencies or any other libraries.
Experiment Setup Yes Our experiments start with the best-published hyperparameter settings for Res Net-20 [2] for use with Batch Normalization on a single GPU. We accept these hyperparameters as fixed values for use with Online Normalization. Online Normalization introduces two hyperparameters, decay rates αf and αb. We used a logarithmic grid sweep to determine good settings.