An Exponential Learning Rate Schedule for Deep Learning

Authors: Zhiyuan Li, Sanjeev Arora

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Section 4, we experimentally verify our theoretical findings on CNNs and Res Nets. We also construct better exponential LR schedules by incorporating the Cosine LR schedule on CIFAR10, which opens the possibility of even more general theory of rate schedule tuning towards better performance.
Researcher Affiliation Academia Zhiyuan Li Princeton University zhiyuanli@cs.princeton.edu Sanjeev Arora Princeton University and Institute for Advanced Study arora@cs.princeton.edu
Pseudocode No The paper describes the SGD with Momentum and Weight Decay algorithm using mathematical formulas (Definition 1.2), but does not present it in a pseudocode block or algorithm box.
Open Source Code No The paper does not provide any statement or link indicating that open-source code for the described methodology is available.
Open Datasets Yes We train Pre Res Net32 on CIFAR10. The initial learning rate is 0.1 and the momentum is 0.9 in all settings.
Dataset Splits No The paper mentions 'CIFAR10' as the dataset used for training and experiments but does not specify the exact training/validation/test splits (e.g., percentages or counts of samples).
Hardware Specification No The paper does not explicitly describe the hardware used for its experiments.
Software Dependencies No The paper mentions 'Py Torch (Paszke et al., 2017)' but does not specify a version number for PyTorch or any other software dependencies.
Experiment Setup Yes Settings: We train Pre Res Net32 on CIFAR10. The initial learning rate is 0.1 and the momentum is 0.9 in all settings. We fix all the scalar and bias of BN, because otherwise they together with the following conv layer grow exponentially, sometimes exceeding the range of Float32 when trained with large growth rate for a long time. We fix the parameters in the last fully connected layer for scale invariance of the objective.