Adam Can Converge Without Any Modification On Update Rules

Authors: Yushun Zhang, Congliang Chen, Naichen Shi, Ruoyu Sun, Zhi-Quan Luo

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We run experiments for different choices of (β1, β2) on a few tasks. First, we run Adam for a convex function (2) with fixed n (see the definition in Section 3.2). Second, we run Adam for the classification problem on data MNIST and CIFAR-10 with fixed batchsize. We observe some interesting phenomena in Figure 1 (a), (b) and (c). While Adam s performances seem unstable in the red region, we find that Adam always performs well in the top blue region in Figure 1.
Researcher Affiliation Academia 1The Chinese University of Hong Kong, Shenzhen, China 2University of Michigan, US 3Shenzhen Research Institute of Big Data
Pseudocode Yes We present randomly shuffled Adam in Algorithm 1. In Algorithm 1, m denotes the 1st-order momentum and v denotes the 2nd-order momentum. they are weighted averaged by hyperparameter β1, β2, respectively.
Open Source Code No No explicit statement or link providing concrete access to source code for the described methodology was found. The paper mentions 'open-source' only in the context of general deep learning libraries, not for their specific implementation.
Open Datasets Yes Second, we run Adam for the classification problem on data MNIST and CIFAR-10 with fixed batchsize. These datasets are commonly referenced and cited: Deng, L. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6):141 142, 2012. Krizhevsky, A., Hinton, G., et al. Learning multiple layers of features from tiny images. 2009.
Dataset Splits No No specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning was found. The paper mentions using 'fixed batchsize' for MNIST and CIFAR-10 experiments, but no train/validation/test splits.
Hardware Specification No No specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments were found. The paper describes the experimental setup but does not specify hardware.
Software Dependencies No No specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment were found. The paper does not mention software versions.
Experiment Setup Yes All the experimental settings and hyperparameters are presented in Appendix B.1. (Referring to detailed experiment setup in the appendix).