reproducibilityindex.ai

Quasi-hyperbolic momentum and Adam for deep learning

Authors: Jerry Ma, Denis Yarats

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically demonstrate that our algorithms lead to signiﬁcantly improved training in a variety of settings, including a new state-of-the-art result on WMT16 EN-DE.
Researcher Affiliation	Collaboration	Jerry Ma Facebook AI Research Menlo Park, CA, USA maj@fb.com Denis Yarats Facebook AI Research & New York University New York, NY, USA denisy@fb.com
Pseudocode	No	The paper describes update rules using mathematical equations but does not provide structured pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	Code is immediately available. 1 https://github.com/facebookresearch/qhoptim/
Open Datasets	Yes	EMNIST digits (Cohen et al., 2017), CIFAR10 (Krizhevsky, 2009), ILSVRC2012 (Russakovsky et al., 2015), Wiki Text-103 (Merity et al., 2016), Mu Jo Co (Todorov et al., 2012), WMT16 EN-DE (Vaswani et al., 2017; Ott et al., 2018)
Dataset Splits	Yes	We train for 90 epochs with size-64 minibatches. Each parameterization is run 3 times with different seeds, and we report training loss, training top-1 error, and validation top-1 error. We use a step decay schedule for the learning rate: α {1, 0.1, 0.01}. That is, the ﬁrst 30 epochs use α = 1.0, the next 30 epochs use α = 0.1, and the ﬁnal 30 epochs use α = 0.01.
Hardware Specification	Yes	Experiments are run on a mix of NVIDIA P100 and V100 GPUs
Software Dependencies	Yes	All experiments use Python 3.7 and Py Torch 0.4.1 (Paszke et al., 2017). Experiments are run on a mix of NVIDIA P100 and V100 GPUs, along with a mix of CUDA 9.0 and 9.2.
Experiment Setup	Yes	We train for 90 epochs with size-64 minibatches. For QHM, we initialize α = 1 and decay it 10-fold every 30 epochs. The sweep grid for QHM... For QHAdam, we ﬁx α = 10 3, ϵ = 10 8, ν2 = 1, and β2 = 0.999, and sweep over ν1 and β1.