Learning Rate Schedules in the Presence of Distribution Shift
Authors: Matthew Fahrbach, Adel Javanmard, Vahab Mirrokni, Pratik Worah
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we provide experiments for high-dimensional regression models and neural networks to illustrate these learning rate schedules and their cumulative regret. In Section 6, we present experiments to study the effect of the proposed learning rate schedules, including high-dimensional regression and a medical application to ow cytometry. |
| Researcher Affiliation | Collaboration | Matthew Fahrbach 1 Adel Javanmard 1 2 Vahab Mirrokni 1 Pratik Worah 1 1Google Research 2University of Southern California. |
| Pseudocode | Yes | Algorithm 1 Optimal learning rate schedule for linear regression undergoing distribution shift. |
| Open Source Code | Yes | The source code is available at https://github.com/ fahrbach/learning-rate-schedules. |
| Open Datasets | Yes | We use the pancreatic RNA expression data in (Bastidas-Ponce et al., 2019; Bergen et al., 2020).3 This data is available at https://scvelo.readthedocs.io/scvelo.datasets.pancreas/. |
| Dataset Splits | No | The paper describes data generation and distribution shifts, but does not explicitly provide training/validation/test dataset splits with percentages, sample counts, or references to predefined splits for reproducibility. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions software like 'Tensor Flow (Abadi et al., 2016)', 'Keras (Chollet et al., 2015)', and 'Adam (Kingma & Ba, 2014)' but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | Each step uses a batch of Bt = 64 new examples to simulate the data stream. We optimize this model in an online manner using Adam (Kingma & Ba, 2014) for different initial learning rates and by optionally resetting its parameters at the beginning of a distribution shift. an initial learning rate of 0.1 for Adam caused the model to diverge but the total regret is minimized with an initial learning rate of 0.01, achieving less regret than η0 {0.001, 0.003, 0.03}. |