Online Normalization for Training Neural Networks
Authors: Vitaliy Chiley, Ilya Sharapov, Atli Kosson, Urs Koster, Ryan Reece, Sofia Samaniego de la Fuente, Vishal Subbiah, Michael James
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present formal proofs and experimental results on Image Net, CIFAR, and PTB datasets. |
| Researcher Affiliation | Industry | Vitaliy Chiley Ilya Sharapov Atli Kosson Urs Koster Ryan Reece Sofía Samaniego de la Fuente Vishal Subbiah Michael James Cerebras Systems 175 S. San Antonio Road Los Altos, California 94022 |
| Pseudocode | Yes | To define Online Normalization (Figure 6), we replace arithmetic averages over the full dataset in (2) with exponentially decaying averages of online samples. Similarly, projections in (4) and (5) are computed over online data using exponentially decaying inner products. The decay factors αf and αb for forward and backward passes respectively are hyperparameters for the technique. |
| Open Source Code | Yes | Scripts to reproduce our results are in the companion repository [3]. [3] Vitaliy Chiley, Michael James, and Ilya Sharapov. Online Normalization reference implementation. https://github.com/cerebras/online-normalization, 2019. |
| Open Datasets | Yes | We present formal proofs and experimental results on Image Net, CIFAR, and PTB datasets. |
| Dataset Splits | Yes | Online Normalization had the best validation performance of all compared methods. |
| Hardware Specification | No | The paper mentions training on 'a single GPU' for CIFAR experiments but does not provide specific details such as GPU model, CPU model, memory, or cloud instance types used for any of the experiments. |
| Software Dependencies | No | The paper mentions providing 'reference code in Py Torch, Tensor Flow, and C' but does not specify version numbers for these software dependencies or any other libraries. |
| Experiment Setup | Yes | Our experiments start with the best-published hyperparameter settings for Res Net-20 [2] for use with Batch Normalization on a single GPU. We accept these hyperparameters as fixed values for use with Online Normalization. Online Normalization introduces two hyperparameters, decay rates αf and αb. We used a logarithmic grid sweep to determine good settings. |