reproducibilityindex.ai

Learning Rate Schedules in the Presence of Distribution Shift

Authors: Matthew Fahrbach, Adel Javanmard, Vahab Mirrokni, Pratik Worah

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we provide experiments for high-dimensional regression models and neural networks to illustrate these learning rate schedules and their cumulative regret. In Section 6, we present experiments to study the effect of the proposed learning rate schedules, including high-dimensional regression and a medical application to ow cytometry.
Researcher Affiliation	Collaboration	Matthew Fahrbach 1 Adel Javanmard 1 2 Vahab Mirrokni 1 Pratik Worah 1 1Google Research 2University of Southern California.
Pseudocode	Yes	Algorithm 1 Optimal learning rate schedule for linear regression undergoing distribution shift.
Open Source Code	Yes	The source code is available at https://github.com/ fahrbach/learning-rate-schedules.
Open Datasets	Yes	We use the pancreatic RNA expression data in (Bastidas-Ponce et al., 2019; Bergen et al., 2020).3 This data is available at https://scvelo.readthedocs.io/scvelo.datasets.pancreas/.
Dataset Splits	No	The paper describes data generation and distribution shifts, but does not explicitly provide training/validation/test dataset splits with percentages, sample counts, or references to predefined splits for reproducibility.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions software like 'Tensor Flow (Abadi et al., 2016)', 'Keras (Chollet et al., 2015)', and 'Adam (Kingma & Ba, 2014)' but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	Each step uses a batch of Bt = 64 new examples to simulate the data stream. We optimize this model in an online manner using Adam (Kingma & Ba, 2014) for different initial learning rates and by optionally resetting its parameters at the beginning of a distribution shift. an initial learning rate of 0.1 for Adam caused the model to diverge but the total regret is minimized with an initial learning rate of 0.01, achieving less regret than η0 {0.001, 0.003, 0.03}.