Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization

Authors: Dongruo Zhou, Jinghui Chen, Yuan Cao, Ziyan Yang, Quanquan Gu

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In order to show that the growth rate condition of the cumulative stochastic gradient indeed holds, we have conducted experiments to estimate the growth rate parameter s for Res Net-18 (He et al., 2016) model and 3-layer LSTM model (Hochreiter & Schmidhuber, 1997) respectively. For simplicity, we assume G = 1 and estimate the growth rate s by calculating the logarithm of the cumulative gradient norm log g1:T,i 2 and calculate log g1:T,i 2. As can be seen from Table 2, s of adaptive gradient methods (Ada Grad, RMSProp and AMSGrad) is smaller than that of SGDM for training 3-layer LSTM model on the Penn Tree Bank (Marcus et al., 1993) dataset.
Researcher Affiliation Academia Dongruo Zhou EMAIL Indiana University Jinghui Chen EMAIL The Pennsylvania State University Yuan Cao EMAIL The University of Hong Kong Ziyan Yang EMAIL Rice University Quanquan Gu EMAIL University of California, Los Angeles
Pseudocode Yes Algorithm 1 AMSGrad (Reddi et al., 2018) Algorithm 2 RMSProp (Tieleman & Hinton, 2012) (modified according to Reddi et al. (2018)) Algorithm 3 Ada Grad (Duchi et al., 2011)
Open Source Code No The paper does not contain any explicit statements about code availability, nor does it provide links to a code repository. The acknowledgment mentions AWS cloud credits but not code release.
Open Datasets Yes As can be seen from Table 2, s of adaptive gradient methods (Ada Grad, RMSProp and AMSGrad) is smaller than that of SGDM for training 3-layer LSTM model on the Penn Tree Bank (Marcus et al., 1993) dataset.
Dataset Splits No The paper mentions using the Penn Tree Bank dataset for training but does not provide specific details on how the dataset was split into training, validation, or test sets.
Hardware Specification No The paper mentions 'We also thank AWS for providing cloud computing credits associated with the NSF BIGDATA award.' but does not specify any particular hardware models (e.g., GPU, CPU models, memory sizes) used for the experiments.
Software Dependencies No The paper does not mention any specific software or library versions used for the experiments. It refers to models like ResNet-18 and LSTM but no versioned software dependencies.
Experiment Setup No The paper discusses models (ResNet-18, 3-layer LSTM) and a dataset (Penn Tree Bank) for empirical evaluation of growth rates but does not provide specific hyperparameters such as learning rates, batch sizes, or optimizer settings for these experiments.