Four Things Everyone Should Know to Improve Batch Normalization

Authors: Cecilia Summers, Michael J. Dinneen

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our results empirically on six datasets: CIFAR-100, SVHN, Caltech-256, Oxford Flowers-102, CUB-2011, and Image Net.
Researcher Affiliation Academia Cecilia Summers Department of Computer Science University of Auckland cecilia.summers.07@gmail.com Michael J. Dinneen Department of Computer Science University of Auckland mjd@cs.auckland.ac.nz
Pseudocode No The paper contains mathematical equations but does not include any explicit pseudocode or algorithm blocks.
Open Source Code Yes We have released code at https://github.com/ceciliaresearch/four_things_ batch_norm.
Open Datasets Yes We validate our results empirically on six datasets: CIFAR-100, SVHN, Caltech-256, Oxford Flowers-102, CUB-2011, and Image Net. ... Image Net ILSVRC 2012 validation set (Russakovsky et al., 2015) ... CIFAR-100 (Krizhevsky & Hinton, 2009) ... SVHN (Netzer et al., 2011) ... Flowers-102 (Nilsback & Zisserman, 2008) ... CUB-2011 (Wah et al., 2011)
Dataset Splits Yes Of the six datasets we experiment with, only Image Net (Russakovsky et al., 2015) and Flowers-102 (Nilsback & Zisserman, 2008) have their own pre-defined validation split, so we constructed validation splits for the other datasets as follows: for CIFAR-100 (Krizhevsky & Hinton, 2009), we randomly took 40,000 of the 50,000 training images for the training split, and the remaining 10,000 as a validation split. For SVHN (Netzer et al., 2011), we similarly split the 604,388 non-test images in a 80-20% split for training and validation. For Caltech-256, no canonical splits of any form are defined, so we used 40 images of each of the 256 categories for training, 10 images for validation, and 30 for testing. For CUB-2011, we used 25% of the given training data as a validation set.
Hardware Specification Yes All experiments were done on two Nvidia Geforce GTX 1080 Ti GPUs.
Software Dependencies No The paper mentions the TensorFlow-slim image classification model library but does not provide specific version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup Yes The model used for CIFAR-100 and SVHN was Res Net-18 (He et al., 2016b;a) with 64, 128, 256, and 512 filters across blocks. For Caltech-256, a much larger Inception-v3 (Szegedy et al., 2016) model was used, and we additionally experiment with Res Net-152 (He et al., 2016b) on Flowers-102 and CUB-2011 in Sec. 4.3. All experiments were done on two Nvidia Geforce GTX 1080 Ti GPUs. ... with an overall batch size of B and a ghost batch size of B ... with a batch size B = 128.