Adam with model exponential moving average is effective for nonconvex optimization
Authors: Kwangjun Ahn, Ashok Cutkosky
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this work, we offer a theoretical analysis of two modern optimization techniques for training large and complex models: (i) adaptive optimization algorithms, such as Adam, and (ii) the model exponential moving average (EMA). |
| Researcher Affiliation | Collaboration | Kwangjun Ahn Microsoft Research Cambridge, MA kwangjunahn@microsoft.com Ashok Cutkosky Boston University Boston, MA ashok@cutkosky.com |
| Pseudocode | Yes | Algorithm 1 Discounted-to-nonconvex conversion (choosing increments via online learning) |
| Open Source Code | No | The paper is theoretical and does not mention releasing any source code. |
| Open Datasets | No | This is a theoretical paper and does not involve empirical experiments or datasets. |
| Dataset Splits | No | This is a theoretical paper and does not involve empirical experiments or dataset splits. |
| Hardware Specification | No | This is a theoretical paper and does not involve empirical experiments, so no hardware specifications are mentioned. |
| Software Dependencies | No | This is a theoretical paper and does not involve empirical experiments, so no specific software dependencies with version numbers are listed. |
| Experiment Setup | No | This is a theoretical paper and does not involve empirical experiments, so no experimental setup details like hyperparameters are provided. |