Convergence of Adam Under Relaxed Assumptions

Authors: Haochuan Li, Alexander Rakhlin, Ali Jadbabaie

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this paper, we provide a rigorous proof of convergence of the Adaptive Moment Estimate (Adam) algorithm for a wide class of optimization objectives. The key to our analysis is a new proof of boundedness of gradients along the optimization trajectory of Adam, under a generalized smoothness assumption according to which the local smoothness (i.e., Hessian norm when it exists) is bounded by a sub-quadratic function of the gradient norm. Moreover, we propose a variance-reduced version of Adam with an accelerated gradient complexity of O(ϵ 3).
Researcher Affiliation Academia Haochuan Li MIT haochuan@mit.edu Alexander Rakhlin MIT rakhlin@mit.edu Ali Jadbabaie MIT jadbabai@mit.edu
Pseudocode Yes Algorithm 1 ADAM
Open Source Code No The paper mentions 'Py Torch implementation' as a default choice for lambda, but does not provide a statement about releasing the authors' own code for the methodology or analysis described in this paper.
Open Datasets Yes Based on our preliminary experimental results on CIFAR-10 shown in Figure 1, the performance of Adam is not very sensitive to the choice of λ.
Dataset Splits No No specific dataset split information (exact percentages, sample counts, or detailed methodology) is provided for the CIFAR-10 dataset used in Figure 1.
Hardware Specification No No specific hardware details (exact GPU/CPU models, processor types, or memory amounts) are mentioned for the experiments, only general statements like 'training deep neural networks' and 'training transformers'.
Software Dependencies No The paper mentions 'Py Torch implementation' but does not specify any version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes Figure 1: Test errors of different models trained on CIFAR-10 using the Adam optimizer with β = 0.9, βsq = 0.999, η = 0.001 and different λs.