Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization

Authors: Dongruo Zhou, Jinghui Chen, Yuan Cao, Ziyan Yang, Quanquan Gu

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In order to show that the growth rate condition of the cumulative stochastic gradient indeed holds, we have conducted experiments to estimate the growth rate parameter s for Res Net-18 (He et al., 2016) model and 3-layer LSTM model (Hochreiter & Schmidhuber, 1997) respectively. For simplicity, we assume G = 1 and estimate the growth rate s by calculating the logarithm of the cumulative gradient norm log g1:T,i 2 and calculate log g1:T,i 2. As can be seen from Table 2, s of adaptive gradient methods (Ada Grad, RMSProp and AMSGrad) is smaller than that of SGDM for training 3-layer LSTM model on the Penn Tree Bank (Marcus et al., 1993) dataset.
Researcher Affiliation	Academia	Dongruo Zhou EMAIL Indiana University Jinghui Chen EMAIL The Pennsylvania State University Yuan Cao EMAIL The University of Hong Kong Ziyan Yang EMAIL Rice University Quanquan Gu EMAIL University of California, Los Angeles
Pseudocode	Yes	Algorithm 1 AMSGrad (Reddi et al., 2018) Algorithm 2 RMSProp (Tieleman & Hinton, 2012) (modified according to Reddi et al. (2018)) Algorithm 3 Ada Grad (Duchi et al., 2011)
Open Source Code	No	The paper does not contain any explicit statements about code availability, nor does it provide links to a code repository. The acknowledgment mentions AWS cloud credits but not code release.
Open Datasets	Yes	As can be seen from Table 2, s of adaptive gradient methods (Ada Grad, RMSProp and AMSGrad) is smaller than that of SGDM for training 3-layer LSTM model on the Penn Tree Bank (Marcus et al., 1993) dataset.
Dataset Splits	No	The paper mentions using the Penn Tree Bank dataset for training but does not provide specific details on how the dataset was split into training, validation, or test sets.
Hardware Specification	No	The paper mentions 'We also thank AWS for providing cloud computing credits associated with the NSF BIGDATA award.' but does not specify any particular hardware models (e.g., GPU, CPU models, memory sizes) used for the experiments.
Software Dependencies	No	The paper does not mention any specific software or library versions used for the experiments. It refers to models like ResNet-18 and LSTM but no versioned software dependencies.
Experiment Setup	No	The paper discusses models (ResNet-18, 3-layer LSTM) and a dataset (Penn Tree Bank) for empirical evaluation of growth rates but does not provide specific hyperparameters such as learning rates, batch sizes, or optimizer settings for these experiments.