On Convergence of Adam for Stochastic Optimization under Relaxed Assumptions
Authors: Yusu Hong, Junhong Lin
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we study Adam in non-convex smooth scenarios with potential unbounded gradients and affine variance noise. ... We show that Adam with a specific hyper-parameter setup can find a stationary point with a O(1/ T) rate in high probability... Moreover, we show that under the same setup, Adam without corrective terms and RMSProp can find a stationary point with a O(1/T + σ0/ T) rate... We also provide a probabilistic convergence result for Adam under a generalized smooth condition... It would be advantageous to provide experimental results to validate the hyper-parameter settings in our results. |
| Researcher Affiliation | Academia | Yusu Hong Center for Data Science and School of Mathematical Sciences Zhejiang University yusuhong@zju.edu.cn Junhong Lin Center for Data Science Zhejiang University junhong@zju.edu.cn |
| Pseudocode | Yes | Algorithm 1 Adam |
| Open Source Code | No | The paper focuses on theoretical analysis and does not mention any release of open-source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not include empirical evaluation on datasets, thus no dataset access information is relevant. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical evaluation or data, therefore no dataset split information is provided. |
| Hardware Specification | No | The paper focuses on theoretical analysis and does not mention any specific hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not include experiments requiring specific software dependencies with version numbers. |
| Experiment Setup | No | The paper describes hyper-parameter settings for the Adam algorithm itself within its theoretical framework (e.g., 'β1, β2 [0, 1)', 'η, ϵ > 0'), but does not detail an experimental setup with training configurations or system-level settings for empirical runs. |