reproducibilityindex.ai

Don't Decay the Learning Rate, Increase the Batch Size

Authors: Samuel L. Smith, Pieter-Jan Kindermans, Chris Ying, Quoc V. Le

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In section 5.1, we demonstrate that decreasing the learning rate and increasing the batch size during training are equivalent. In section 5.2, we show we can further reduce the number of parameter updates by increasing the effective learning rate and scaling the batch size. In section 5.3 we apply our insights to train Inception-Res Net-V2 on Image Net, using vast batches of up to 65536 images. Finally in section 5.4, we train Res Net-50 to 76.1% Image Net validation accuracy within 30 minutes.
Researcher Affiliation	Industry	Samuel L. Smith , Pieter-Jan Kindermans , Chris Ying & Quoc V. Le Google Brain {slsmith, pikinder, chrisying, qvl}@google.com
Pseudocode	No	The paper describes mathematical formulations and experimental procedures in narrative text, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement about releasing its source code or a link to a code repository for the methodology described.
Open Datasets	Yes	We train Res Net-50 on Image Net to 76.1% validation accuracy in under 30 minutes. Our ﬁrst experiments are performed on CIFAR-10, using a 16-4 wide Res Net architecture, following the implementation of Zagoruyko & Komodakis (2016).
Dataset Splits	No	The paper mentions training on CIFAR-10 (50000 training images) and ImageNet, and reports 'validation accuracy' and 'test set accuracy', implying standard splits for these public datasets. However, it does not explicitly state the percentages or sample counts for training, validation, and test dataset splits.
Hardware Specification	Yes	To conﬁrm that increasing the batch size during training can reduce model training times, we replicated the set-up described by Goyal et al. (2017) on a half TPU pod, comprising 256 tensorcores (Jouppi et al., 2017).
Software Dependencies	No	The paper states 'Using tensor Flow' but does not specify a version number for TensorFlow or any other software dependencies, which is required for reproducibility.
Experiment Setup	Yes	We use ghost batch norm (Hoffer et al., 2017), with a ghost batch size of 128. Original training schedule follows the implementation of Zagoruyko & Komodakis (2016), using an initial learning rate of 0.1 which decays by a factor of 5 at each step, a momentum coefﬁcient of 0.9, and a batch size of 128. (Many more specific values are given throughout Section 5).