reproducibilityindex.ai

On the Variance of the Adaptive Learning Rate and Beyond

Authors: Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, Jiawei Han

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on image classiﬁcation, language modeling, and neural machine translation verify our intuition and demonstrate the efﬁcacy and robustness of RAdam.
Researcher Affiliation	Collaboration	University of Illinois, Urbana-Champaign; Georgia Tech; Microsoft Dynamics 365 AI; Microsoft Research
Pseudocode	Yes	Algorithm 1: Generic adaptive optimization method setup. and Algorithm 2: Rectiﬁed Adam.
Open Source Code	Yes	All implementations are available at: https://github.com/LiyuanLucasLiu/RAdam.
Open Datasets	Yes	IWSLT 14 German to English translation dataset (Cettolo et al., 2014); One Billion Word (Chelba et al., 2013)); Cifar10 (Krizhevsky et al., 2009) and Image Net (Deng et al., 2009))
Dataset Splits	No	The paper mentions using standard datasets like CIFAR-10, ImageNet, IWSLT 14, and WMT 16, which have well-defined splits, but it does not explicitly state the training, validation, and test split percentages or sample counts within the paper's text.
Hardware Specification	Yes	All models are trained on one NVIDIA Tesla V100 GPU.; we conduct training on one NVIDIA Tesla V100 GPU; we conduct training on four NVIDIA Quadro R8000 GPUs
Software Dependencies	No	The paper mentions using a 'public pytorch re-implementation' and the 'fairseq package' but does not specify their version numbers or any other software dependencies with version information.
Experiment Setup	Yes	For Adam and RAdam, we set β1 = 0.9, β2 = 0.999. For SGD, we set the momentum factor as 0.9. The weight decay rate is 10^-4. Random cropping and random horizontal ﬂipping are applied to training data.