reproducibilityindex.ai

Convergence of Adam Under Relaxed Assumptions

Authors: Haochuan Li, Alexander Rakhlin, Ali Jadbabaie

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this paper, we provide a rigorous proof of convergence of the Adaptive Moment Estimate (Adam) algorithm for a wide class of optimization objectives. The key to our analysis is a new proof of boundedness of gradients along the optimization trajectory of Adam, under a generalized smoothness assumption according to which the local smoothness (i.e., Hessian norm when it exists) is bounded by a sub-quadratic function of the gradient norm. Moreover, we propose a variance-reduced version of Adam with an accelerated gradient complexity of O(ϵ 3).
Researcher Affiliation	Academia	Haochuan Li MIT haochuan@mit.edu Alexander Rakhlin MIT rakhlin@mit.edu Ali Jadbabaie MIT jadbabai@mit.edu
Pseudocode	Yes	Algorithm 1 ADAM
Open Source Code	No	The paper mentions 'Py Torch implementation' as a default choice for lambda, but does not provide a statement about releasing the authors' own code for the methodology or analysis described in this paper.
Open Datasets	Yes	Based on our preliminary experimental results on CIFAR-10 shown in Figure 1, the performance of Adam is not very sensitive to the choice of λ.
Dataset Splits	No	No specific dataset split information (exact percentages, sample counts, or detailed methodology) is provided for the CIFAR-10 dataset used in Figure 1.
Hardware Specification	No	No specific hardware details (exact GPU/CPU models, processor types, or memory amounts) are mentioned for the experiments, only general statements like 'training deep neural networks' and 'training transformers'.
Software Dependencies	No	The paper mentions 'Py Torch implementation' but does not specify any version numbers for PyTorch or any other software dependencies.
Experiment Setup	Yes	Figure 1: Test errors of different models trained on CIFAR-10 using the Adam optimizer with β = 0.9, βsq = 0.999, η = 0.001 and different λs.