An Exponential Learning Rate Schedule for Deep Learning
Authors: Zhiyuan Li, Sanjeev Arora
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 4, we experimentally verify our theoretical findings on CNNs and Res Nets. We also construct better exponential LR schedules by incorporating the Cosine LR schedule on CIFAR10, which opens the possibility of even more general theory of rate schedule tuning towards better performance. |
| Researcher Affiliation | Academia | Zhiyuan Li Princeton University zhiyuanli@cs.princeton.edu Sanjeev Arora Princeton University and Institute for Advanced Study arora@cs.princeton.edu |
| Pseudocode | No | The paper describes the SGD with Momentum and Weight Decay algorithm using mathematical formulas (Definition 1.2), but does not present it in a pseudocode block or algorithm box. |
| Open Source Code | No | The paper does not provide any statement or link indicating that open-source code for the described methodology is available. |
| Open Datasets | Yes | We train Pre Res Net32 on CIFAR10. The initial learning rate is 0.1 and the momentum is 0.9 in all settings. |
| Dataset Splits | No | The paper mentions 'CIFAR10' as the dataset used for training and experiments but does not specify the exact training/validation/test splits (e.g., percentages or counts of samples). |
| Hardware Specification | No | The paper does not explicitly describe the hardware used for its experiments. |
| Software Dependencies | No | The paper mentions 'Py Torch (Paszke et al., 2017)' but does not specify a version number for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | Settings: We train Pre Res Net32 on CIFAR10. The initial learning rate is 0.1 and the momentum is 0.9 in all settings. We fix all the scalar and bias of BN, because otherwise they together with the following conv layer grow exponentially, sometimes exceeding the range of Float32 when trained with large growth rate for a long time. We fix the parameters in the last fully connected layer for scale invariance of the objective. |