Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

AdaGrad stepsizes: Sharp convergence over nonconvex landscapes

Authors: Rachel Ward, Xiaoxia Wu, Leon Bottou

JMLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive numerical experiments are provided to corroborate our theoretical ﬁndings; moreover, the experiments suggest that the robustness of Ada Grad-Norm extends to the models in deep learning.
Researcher Affiliation	Collaboration	Rachel Ward EMAIL Xiaoxia Wu EMAIL Department of Mathematics The University of Texas at Austin 2515 Speedway, Austin, TX, 78712, USA Léon Bottou EMAIL Facebook AI Research 770 Broadway, New York, NY, 10019, USA
Pseudocode	Yes	Algorithm 1 Ada Grad-Norm... Algorithm 2 Gradient Descent with Line Search Method... Algorithm 3 Ada Grad-Norm with momentum in Py Torch
Open Source Code	Yes	Details in implementing Ada Grad-Norm in a neural network are explained in the appendix and the code is also provided.4 4. https://github.com/xwuShirley/pytorch/blob/master/torch/optim/adagradnorm.py
Open Datasets	Yes	Datasets and Models We test on three data sets: MNIST (Le Cun et al., 1998), CIFAR-10 (Krizhevsky, 2009) and Image Net (Deng et al., 2009), see Table 1 in the appendix for detailed descriptions.
Dataset Splits	Yes	Table 1: Statistics of data sets. DIM is the dimension of a sample Dataset Train Test Classes Dim MNIST 60,000 10,000 10 28 28 CIFAR-10 50,000 10,000 10 32 32 Image Net 1,281,167 50,000 1000 Various
Hardware Specification	No	For both data sets, we use 256 images per iteration (2 GPUs with 128 images/GPU, 234 iterations per epoch for MNIST and 196 iterations per epoch for CIFAR10). For Imaget Net, we use Res Net-50 and 256 images for one iteration (8 GPUs with 32 images/GPU, 5004 iterations per epoch). The paper mentions using GPUs but does not specify the model or type of GPUs, which is required for specific hardware details.
Software Dependencies	No	The experiments are done in Py Torch (Paszke et al., 2017). This citation refers to PyTorch but does not provide a specific version number for PyTorch or any other software dependencies.
Experiment Setup	Yes	We set η = 1 in all Ada Grad implementations, noting that in all these problems we know that F = 0 and we measure that F(x0) is between 1 and 10. Indeed, we approximate the loss using a sample of 256 images to be 1 256 P256 i=1 fi(x0): 2.4129 for logistic regression, 2.305 for two-layer fully connected model, 2.301 for convolution neural network, 2.3848 for Res Net-18 with disable learnable parameter in Batch-Norm, 2.3459 for Res Net-18 with default Batch-Norm, and 7.704 for Res Net-50. We vary the initialization b0 while ﬁxing all other parameters and plot the training accuracy and testing accuracy after diﬀerent numbers of epochs. For the experiment in MNIST, we do not use bias, regularization (zero weight decay), dropout, momentum, batch normalization (Ioﬀe and Szegedy, 2015), or any other added features that help achieving SOTA performance (see Figure 3 and Figure 4).