Adam Can Converge Without Any Modification On Update Rules
Authors: Yushun Zhang, Congliang Chen, Naichen Shi, Ruoyu Sun, Zhi-Quan Luo
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We run experiments for different choices of (β1, β2) on a few tasks. First, we run Adam for a convex function (2) with fixed n (see the definition in Section 3.2). Second, we run Adam for the classification problem on data MNIST and CIFAR-10 with fixed batchsize. We observe some interesting phenomena in Figure 1 (a), (b) and (c). While Adam s performances seem unstable in the red region, we find that Adam always performs well in the top blue region in Figure 1. |
| Researcher Affiliation | Academia | 1The Chinese University of Hong Kong, Shenzhen, China 2University of Michigan, US 3Shenzhen Research Institute of Big Data |
| Pseudocode | Yes | We present randomly shuffled Adam in Algorithm 1. In Algorithm 1, m denotes the 1st-order momentum and v denotes the 2nd-order momentum. they are weighted averaged by hyperparameter β1, β2, respectively. |
| Open Source Code | No | No explicit statement or link providing concrete access to source code for the described methodology was found. The paper mentions 'open-source' only in the context of general deep learning libraries, not for their specific implementation. |
| Open Datasets | Yes | Second, we run Adam for the classification problem on data MNIST and CIFAR-10 with fixed batchsize. These datasets are commonly referenced and cited: Deng, L. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6):141 142, 2012. Krizhevsky, A., Hinton, G., et al. Learning multiple layers of features from tiny images. 2009. |
| Dataset Splits | No | No specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning was found. The paper mentions using 'fixed batchsize' for MNIST and CIFAR-10 experiments, but no train/validation/test splits. |
| Hardware Specification | No | No specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments were found. The paper describes the experimental setup but does not specify hardware. |
| Software Dependencies | No | No specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment were found. The paper does not mention software versions. |
| Experiment Setup | Yes | All the experimental settings and hyperparameters are presented in Appendix B.1. (Referring to detailed experiment setup in the appendix). |