Acceleration via Fractal Learning Rate Schedules
Authors: Naman Agarwal, Surbhi Goel, Cyril Zhang
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide some experiments to challenge conventional beliefs about stable learning rates in deep learning: the fractal schedule enables training to converge with locally unstable updates which make negative progress on the objective. |
| Researcher Affiliation | Industry | 1Google AI Princeton, Princeton, NJ, USA 2Microsoft Research, New York, NY, USA. Correspondence to: Cyril Zhang <cyrilzhang@microsoft.com>. |
| Pseudocode | No | The paper defines constructions and outlines procedures but does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | As an invitation to try these ideas in various experimental settings, we provide in Appendix A some Python code to generate Chebyshev learning rates and fractal schedules. |
| Open Datasets | Yes | Figure 5 shows training curves for logistic regression for MNIST classification; details are in Appendix F.3. ... Figure 6: Res Net-18/CIFAR-10 training with batch size 8192 and a repeated T = 8 fractal Chebyshev schedule. |
| Dataset Splits | No | The paper mentions using MNIST and CIFAR-10 datasets, which typically have standard splits, but it does not explicitly provide specific percentages, sample counts, or detailed splitting methodologies for training, validation, or test sets within the paper text. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used, such as GPU or CPU models, for running its experiments. |
| Software Dependencies | No | The paper mentions 'Python code' in Appendix A but does not provide specific version numbers for Python or any other key software components, libraries, or solvers used in the experiments. |
| Experiment Setup | Yes | Figure 6: Res Net-18/CIFAR-10 training with batch size 8192 and a repeated T = 8 fractal Chebyshev schedule. |