Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Aiming towards the minimizers: fast convergence of SGD for overparametrized problems
Authors: Chaoyue Liu, Dmitriy Drusvyatskiy, Misha Belkin, Damek Davis, Yian Ma
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | As a concrete illustration of the disparity between theory and practice, Figure 1 depicts the convergence behavior of SGD for training a neural network on the MNIST data set. In both cases, we observe that the estimate stays positive, which suggests that aiming condition holds. |
| Researcher Affiliation | Academia | Chaoyue Liu*, Dmitriy Drusvyatskiy**, Yian Ma*, Damek Davis***, and Mikhail Belkin* *Halicio glu Data Science Institute, University of California San Diego **Mathematics Department, University of Washington ***School of of Operations Research and Information Engineering, Cornell University |
| Pseudocode | Yes | Algorithm 1 SGD(w0, η, T) |
| Open Source Code | No | The paper does not provide an explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | Figure 1: Convergence plot of SGD when training a fully connected neural network with 3 hidden layers and 1000 neurons in each on MNIST (left) and a Res Net-28 on CIFAR-10 (right). We conduct the experiments on two datasets, MNIST and CIFAR-10. |
| Dataset Splits | No | The paper mentions total image counts for MNIST (60k) and CIFAR-10 (60k) but does not provide explicit training/validation/test split percentages or sample counts, nor does it refer to predefined standard splits for reproduction beyond mentioning the datasets themselves. |
| Hardware Specification | Yes | Specifically, we used the resources from SDSC Expanse GPU compute nodes, and NCSA Delta system, via allocations TG-CIS220009. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies such as programming languages, libraries, or frameworks used in the experiments. |
| Experiment Setup | Yes | We train a fully-connected neural network on the MNIST dataset. The network has 4 hidden layers, each with 1024 neurons. We optimize the MSE loss using SGD with a batch size 512 and a learning rate 0.5. The training was run over 1k epochs, and the ratio E[ ℓ(w, z) 2]/ L(w) 2 is evaluated every 100 epochs. |