reproducibilityindex.ai

On the Adequacy of Untuned Warmup for Adaptive Optimization

Authors: Jerry Ma, Denis Yarats8828-8836

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate untuned exponential warmup (Equation 13), untuned linear warmup (Equation 14), and RAdam across a variety of supervised machine learning tasks. For brevity, all experimental settings are summarized in the main text and comprehensively detailed in Appendix A. 5.1 Image Classiﬁcation Using each of the three warmup methods, we train a Res Net-50 model (He et al. 2016) on the ILSVRC ( Image Net ) image classiﬁcation dataset with various conﬁgurations of Adam. [...] Table 1 presents the top-1 error rates at the end of training for the three warmup methods.
Researcher Affiliation	Collaboration	Jerry Ma, 1 2 Denis Yarats 3 4 1 Booth School of Business, University of Chicago 2 U.S. Patent and Trademark Ofﬁce, Department of Commerce 3 Courant Institute of Mathematical Sciences, New York University 4 Facebook AI Research
Pseudocode	No	The paper provides mathematical equations for optimization algorithms (Eqs. 1-11) but no explicitly labeled "Pseudocode" or "Algorithm" block.
Open Source Code	No	No explicit statement or link for the open-source code for their methodology is provided.
Open Datasets	Yes	We train a Res Net-50 model (He et al. 2016) on the ILSVRC ( Image Net ) image classiﬁcation dataset, EMNIST digit recognition task (Cohen et al. 2017), a state-of-the-art Transformer-based language model from Baevski and Auli (2018) on WIKITEXT-103, and a Transformer model (Vaswani et al. 2017) on the WMT16 English-German ( EN-DE ) dataset. These are all standard, publicly available datasets.
Dataset Splits	No	Appendix C.1 provides both training and validation metrics (Figures 7 and 8 respectively) for all tested conﬁgurations, reinforcing this trend. While validation is used, explicit percentages or sample counts for train/validation/test splits are not stated in the main text.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory) are mentioned for running experiments.
Software Dependencies	No	The paper mentions "PyTorch Examples" and "Automatic differentiation in PyTorch" in the references but does not specify version numbers for any software dependencies used in the experiments.
Experiment Setup	Yes	We train a Res Net-50 model (He et al. 2016) on the ILSVRC ( Image Net ) image classiﬁcation dataset with various conﬁgurations of Adam. Speciﬁcally, we sweep over: α (learning rate) 10 4, 10 3, 10 2 β2 {0.99, 0.997, 0.999} and We sweep over the following grid of Adam hyperparmeters: α (learning rate) 1 10 4, 3 10 4, 5 10 4 β2 {0.99, 0.998, 0.999} with β1 = 0.9 and ϵ = 10 7 ﬁxed.