reproducibilityindex.ai

AutoLRS: Automatic Learning-Rate Schedule by Bayesian Optimization on the Fly

Authors: Yuchen Jin, Tianyi Zhou, Liangyu Zhao, Yibo Zhu, Chuanxiong Guo, Marco Canini, Arvind Krishnamurthy

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the advantages and the generality of Auto LRS through extensive experiments of training DNNs for tasks from diverse domains using different optimizers.
Researcher Affiliation	Collaboration	Yuchen Jin, Tianyi Zhou, Liangyu Zhao University of Washington {yuchenj, tianyizh, liangyu}@cs.washington.edu Yibo Zhu, Chuanxiong Guo Byte Dance Inc. {zhuyibo, guochuanxiong}@bytedance.com Marco Canini KAUST marco@kaust.edu.sa Arvind Krishnamurthy University of Washington arvind@cs.washington.edu
Pseudocode	Yes	Algorithm 1: Auto LRS Input : (1) Number of steps in each training stage, τ (2) Learning-rate search interval (ηmin, ηmax) (3) Number of LRs to evaluate by BO in each training stage, k (4) Number of training steps to evaluate each LR in BO, τ (5) Trade-off weight in the acquisition function of BO, κ
Open Source Code	Yes	The Auto LRS implementation is available at https://github.com/Yuchen Jin/autolrs.
Open Datasets	Yes	Res Net-50 (He et al., 2016a) on Image Net classiﬁcation (Russakovsky et al., 2015); Transformer (Vaswani et al., 2017) and BERT (Devlin et al., 2019) for NLP tasks. We train Res Net-50 on Image Net (Russakovsky et al., 2015) using SGD with momentum on 32 NVIDIA Tesla V100 GPUs with data parallelism and a mini-batch size of 1024.
Dataset Splits	Yes	Auto LRS aims to ﬁnd an LR applied to every τ steps that minimizes the resulted validation loss.
Hardware Specification	Yes	We train Res Net-50 on Image Net (Russakovsky et al., 2015) using SGD with momentum on 32 NVIDIA Tesla V100 GPUs with data parallelism and a mini-batch size of 1024.
Software Dependencies	No	The paper mentions using 'Py Torch implementation' but does not specify a version number for PyTorch or any other software dependencies with specific versions.
Experiment Setup	Yes	In our default setting, we set k = 10 and τ = τ/10 so that the training steps spent on BO equals the training steps spent on updating the DNN model. We start from τ = 1000 and τ = 100 and double τ and τ after each stage until τ reaches τmax. We use τmax = 8000 for Res Net-50 and Transformer, τmax = 32000 for BERT.