reproducibilityindex.ai

An Exponential Learning Rate Schedule for Deep Learning

Authors: Zhiyuan Li, Sanjeev Arora

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Section 4, we experimentally verify our theoretical ﬁndings on CNNs and Res Nets. We also construct better exponential LR schedules by incorporating the Cosine LR schedule on CIFAR10, which opens the possibility of even more general theory of rate schedule tuning towards better performance.
Researcher Affiliation	Academia	Zhiyuan Li Princeton University zhiyuanli@cs.princeton.edu Sanjeev Arora Princeton University and Institute for Advanced Study arora@cs.princeton.edu
Pseudocode	No	The paper describes the SGD with Momentum and Weight Decay algorithm using mathematical formulas (Deﬁnition 1.2), but does not present it in a pseudocode block or algorithm box.
Open Source Code	No	The paper does not provide any statement or link indicating that open-source code for the described methodology is available.
Open Datasets	Yes	We train Pre Res Net32 on CIFAR10. The initial learning rate is 0.1 and the momentum is 0.9 in all settings.
Dataset Splits	No	The paper mentions 'CIFAR10' as the dataset used for training and experiments but does not specify the exact training/validation/test splits (e.g., percentages or counts of samples).
Hardware Specification	No	The paper does not explicitly describe the hardware used for its experiments.
Software Dependencies	No	The paper mentions 'Py Torch (Paszke et al., 2017)' but does not specify a version number for PyTorch or any other software dependencies.
Experiment Setup	Yes	Settings: We train Pre Res Net32 on CIFAR10. The initial learning rate is 0.1 and the momentum is 0.9 in all settings. We ﬁx all the scalar and bias of BN, because otherwise they together with the following conv layer grow exponentially, sometimes exceeding the range of Float32 when trained with large growth rate for a long time. We ﬁx the parameters in the last fully connected layer for scale invariance of the objective.