Adam with model exponential moving average is effective for nonconvex optimization

Authors: Kwangjun Ahn, Ashok Cutkosky

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this work, we offer a theoretical analysis of two modern optimization techniques for training large and complex models: (i) adaptive optimization algorithms, such as Adam, and (ii) the model exponential moving average (EMA).
Researcher Affiliation Collaboration Kwangjun Ahn Microsoft Research Cambridge, MA kwangjunahn@microsoft.com Ashok Cutkosky Boston University Boston, MA ashok@cutkosky.com
Pseudocode Yes Algorithm 1 Discounted-to-nonconvex conversion (choosing increments via online learning)
Open Source Code No The paper is theoretical and does not mention releasing any source code.
Open Datasets No This is a theoretical paper and does not involve empirical experiments or datasets.
Dataset Splits No This is a theoretical paper and does not involve empirical experiments or dataset splits.
Hardware Specification No This is a theoretical paper and does not involve empirical experiments, so no hardware specifications are mentioned.
Software Dependencies No This is a theoretical paper and does not involve empirical experiments, so no specific software dependencies with version numbers are listed.
Experiment Setup No This is a theoretical paper and does not involve empirical experiments, so no experimental setup details like hyperparameters are provided.