Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models
Authors: Sergey Ioffe
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate Batch Renormalization, we applied it to the problem of image classification. Our baseline model is Inception v3 [13], trained on 1000 classes from Image Net training set [9], and evaluated on the Image Net validation data. |
| Researcher Affiliation | Industry | Sergey Ioffe Google sioffe@google.com |
| Pseudocode | Yes | Algorithm 1: Training (top) and inference (bottom) with Batch Renormalization, applied to activation x over a mini-batch. |
| Open Source Code | No | The paper does not provide any explicit links to open-source code for the described methodology or state that the code is being released. |
| Open Datasets | Yes | Our baseline model is Inception v3 [13], trained on 1000 classes from Image Net training set [9], and evaluated on the Image Net validation data. |
| Dataset Splits | Yes | Our baseline model is Inception v3 [13], trained on 1000 classes from Image Net training set [9], and evaluated on the Image Net validation data. |
| Hardware Specification | No | The paper mentions 'The training used 50 synchronized workers [3]' but does not provide specific details about the hardware used (e.g., GPU models, CPU types, memory). |
| Software Dependencies | No | The paper mentions the use of 'RMSProp optimizer [14]' and 'Re LU [8]' which are techniques, but it does not specify any software dependencies with version numbers (e.g., Python, TensorFlow, PyTorch versions). |
| Experiment Setup | Yes | The training used 50 synchronized workers [3]. Each worker processed a minibatch of 32 examples per training step. [...] For Batch Renorm, we used rmax = 1, dmax = 0 (i.e. simply batchnorm) for the first 5000 training steps, after which these were gradually relaxed to reach rmax = 3 at 40k steps, and dmax = 5 at 25k steps. [...] we used relatively fast updates to the moving statistics µ and σ, with α = 0.01. |