reproducibilityindex.ai

Learning to Schedule Learning rate with Graph Neural Networks

Authors: Yuanhao Xiong, Li-Cheng Lan, Xiangning Chen, Ruochen Wang, Cho-Jui Hsieh

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our framework on benchmarking datasets, Fashion-MNIST and CIFAR10 for image classiﬁcation, and GLUE for language understanding. GNS shows consistent improvement over popular baselines when training CNN and Transformer models. Moreover, GNS demonstrates great generalization to different datasets and network structures. Our code is available at https://github.com/xyh97/GNS. ... To validate the effectiveness of GNS, we evaluate our method on various tasks in image classiﬁcation and language understanding, and compare it with popular learning rate scheduling rules. We further investigate the generalization of GNS on different transfer tasks. In addition, we conduct an ablation study to analyze the state representation and reward collection.
Researcher Affiliation	Academia	Yuanhao Xiong, Li-Cheng Lan, Xiangning Chen, Ruochen Wang, Cho-Jui Hsieh Department of Computer Science, UCLA {yhxiong, lclan, xiangning, chohsieh}@cs.ucla.edu ruocwang@ucla.edu
Pseudocode	Yes	Algorithm 1 Graph Network-based Scheduler. Input: Value network parameterized by φ, action network parameterized by ϕ, # updates T, decision interval K, prior learning rate distribution Dα
Open Source Code	Yes	Our code is available at https://github.com/xyh97/GNS.
Open Datasets	Yes	Image classiﬁcation. We consider two benchmark datasets in image classiﬁcation, Fashion MNIST (Xiao et al., 2017) and CIFAR10 (Krizhevsky et al., 2014). ... Language understanding. For language understanding, we conduct experiments on GLUE (Wang et al., 2019), a benchmark consisting of eight sentenceor sentence-pair tasks.
Dataset Splits	Yes	These two datasets are ﬁrst split into the standard training and test sets. Then we randomly sample 10k images for each dataset from the training set to construct a validation set. ... For language understanding, we conduct experiments on GLUE (Wang et al., 2019), a benchmark consisting of eight sentenceor sentence-pair tasks. They are divided into training, validation and test sets and we have no access to ground truth labels of test sets.
Hardware Specification	Yes	For instance, when running on MRPC for 5 epochs, we need to make 58 decisions with the number of network updates K = 10. The average time of one episode with one NVIDIA 1080Ti GPU of SRLS is 405s while GNS only takes 259s, which decreases the original cost by 30%.
Software Dependencies	No	All Ro BERTa models in this paper are implemented by Hugging Face1 Wolf et al. (2020) and pre-trained models are obtained from the corresponding model hub2. ... Hugging Face1 Wolf et al. (2020) and pre-trained models are obtained from the corresponding model hub2. ... 1https://github.com/huggingface/transformers. The paper mentions Hugging Face and its transformers library, but does not specify version numbers for these or other software dependencies.
Experiment Setup	Yes	In this section, we present our experimental settings. Further details can be found in Appendix B. ... We use Adam (Kingma & Ba, 2014) with the batch size of 128 for 200 epochs to train these two image classiﬁcation tasks. ... The Adam W (Loshchilov & Hutter, 2017) optimizer is adopted to train Ro BERTa models. Details of other hyperparameters like batch size and episode length for each task are provided in Appendix B. ... Table 7: Hyperparameter conﬁguration for GLUE benchmarking datasets. ... Table 8: Hyperparameter conﬁguration for GLUE benchmarking datasets.