Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
AdaGrad Stepsizes: Sharp Convergence Over Nonconvex Landscapes
Authors: Rachel Ward, Xiaoxia Wu, Leon Bottou
ICML 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments in Section 4 shows that the robustness of Ada Grad-Norm extends from simple linear regression to state-of-the-art models in deep learning, without sacrificing generalization. |
| Researcher Affiliation | Collaboration | Rachel Ward * 1 2 Xiaoxia Wu * 1 2 Léon Bottou 2 *Equal contribution 1Department of Mathematics, The University of Texas at Austin, USA 2Facebook AI Research, New York, USA. |
| Pseudocode | Yes | Algorithm 1 Ada Grad-Norm |
| Open Source Code | Yes | Details in implementing Ada Grad-Norm in a neural network are explained in the appendix and the code is also provided. 3Ada Grad-Norm https://github.com/xwuShirley/pytorch/ blob/master/torch/optim/adagradnorm.py |
| Open Datasets | Yes | Datasets and Models We test on three data sets: MNIST (Le Cun et al., 1998), CIFAR-10 (Krizhevsky, 2009) and Image Net (Deng et al., 2009) |
| Dataset Splits | No | The paper mentions using "mini-batch" sizes for training and reports training/testing accuracy, but does not explicitly state the dataset split percentages or sample counts for train/validation/test sets. It also refers to "standard setup" for ResNet without elaboration on splits. |
| Hardware Specification | No | The paper mentions "2 GPUs" and "8 GPUs" used for experiments but does not specify the exact GPU models, CPU models, or other hardware specifications. |
| Software Dependencies | No | The paper states experiments are "done in Py Torch (Paszke et al., 2017)" but does not specify the version number of PyTorch or any other software dependencies. |
| Experiment Setup | Yes | We set η = 1 in Ada Grad-Norm implementations, noting that in all these problems we know that F = 0 and measure that F(x0) is between 1 and 10. [...] For both data sets, we use simple SGD without momentum and set mini-batch of 128 images per iteration [...] For Imaget Net, we use Res Net-50 with no momentum and 256 images for one iteration. [...] In addition, we set the initialization of weights in the last fully connected layer to be i.i.d. Gaussian with zero mean and variance 1/2048. |