reproducibilityindex.ai

On the Origin of Implicit Regularization in Stochastic Gradient Descent

Authors: Samuel L Smith, Benoit Dherin, David Barrett, Soham De

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We verify empirically that explicitly including the implicit regularizer in the loss can enhance the test accuracy when the learning rate is small. ... In Section 2.3, we conﬁrm empirically that the implicit regularizer can enhance the test accuracy of deep networks.
Researcher Affiliation	Industry	Samuel L. Smith1, Benoit Dherin2, David G. T. Barrett1 and Soham De1 1Deep Mind, 2Google {slsmith, dherin, barrettdavid,sohamde}@google.com
Pseudocode	No	The paper does not contain any sections, figures, or blocks explicitly labeled as 'Pseudocode' or 'Algorithm', nor does it present any structured steps in a code-like format.
Open Source Code	No	The paper does not include any explicit statements about releasing source code for the methodology described, nor does it provide any links to a code repository.
Open Datasets	Yes	We train the same model with two different (explicit) loss functions. ... We use a 10-1 Wide-Res Net model (Zagoruyko & Komodakis, 2016) for classiﬁcation on CIFAR-10. ... In this section we provide additional experiments on the Fashion-MNIST dataset (Xiao et al., 2017)...
Dataset Splits	No	The paper discusses 'training' and 'test' sets and 'test accuracy', but it does not specify any training/test/validation dataset splits, such as percentages, sample counts for each split, or references to predefined validation splits with citations.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications. It only refers to models like '10-1 Wide-Res Net' which are network architectures, not hardware.
Software Dependencies	No	The paper mentions using 'SGD without Momentum' and 'training without batch normalization', which are algorithmic choices, but does not provide specific software names with version numbers for any libraries, frameworks, or environments used in the experiments.
Experiment Setup	Yes	We train for 6400 epochs at batch size 32 without learning rate decay using SGD without Momentum. We use standard data augmentation including crops and random ﬂips, and we use weight decay with L2 coefﬁcient 5 x 10^-4. ... We use a batch size B = 16 unless otherwise speciﬁed, and we do not use weight decay.