reproducibilityindex.ai

Adaptive Gradient Methods with Dynamic Bound of Learning Rate

Authors: Liangchen Luo, Yuanhao Xiong, Yan Liu, Xu Sun

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We further conduct experiments on various popular tasks and models, which is often insufficient in previous work. Experimental results show that new variants can eliminate the generalization gap between adaptive methods and SGD and maintain higher learning speed early in training at the same time.
Researcher Affiliation	Collaboration	MOE Key Lab of Computational Linguistics, School of EECS, Peking University College of Information Science and Electronic Engineering, Zhejiang University Department of Computer Science, University of Southern California Center for Data Science, Beijing Institute of Big Data Research, Peking University {luolc,xusun}@pku.edu.cn xiongyh@zju.edu.cn yanliu.cs@usc.edu Equal contribution. This work was done when the ﬁrst and second authors were on an internship at Di Di AI Labs.
Pseudocode	Yes	Algorithm 1 Generic framework of optimization methods
Open Source Code	Yes	The implementation of the algorithm can be found at https://github.com/Luolc/AdaBound.
Open Datasets	Yes	We focus on three tasks: the MNIST image classiﬁcation task (Lecun et al., 1998), the CIFAR-10 image classiﬁcation task (Krizhevsky & Hinton, 2009), and the language modeling task on Penn Treebank (Marcus et al., 1993).
Dataset Splits	No	Adaptive methods often display faster progress in the initial portion of the training, but their performance quickly plateaus on the unseen data (development/test set) (Wilson et al., 2017).
Hardware Specification	No	We focus on three tasks: the MNIST image classiﬁcation task (Lecun et al., 1998), the CIFAR-10 image classiﬁcation task (Krizhevsky & Hinton, 2009), and the language modeling task on Penn Treebank (Marcus et al., 1993).
Software Dependencies	No	The implementation of the algorithm can be found at https://github.com/Luolc/AdaBound.
Experiment Setup	Yes	To tune the step size, we follow the method in Wilson et al. (2017). We implement a logarithmically-spaced grid of ﬁve step sizes. If the best performing parameter is at one of the extremes of the grid, we will try new grid points so that the best performing parameters are at one of the middle points in the grid. Speciﬁcally, we tune over hyperparameters in the following way.