reproducibilityindex.ai

Aiming towards the minimizers: fast convergence of SGD for overparametrized problems

Authors: Chaoyue Liu, Dmitriy Drusvyatskiy, Misha Belkin, Damek Davis, Yian Ma

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	As a concrete illustration of the disparity between theory and practice, Figure 1 depicts the convergence behavior of SGD for training a neural network on the MNIST data set. In both cases, we observe that the estimate stays positive, which suggests that aiming condition holds.
Researcher Affiliation	Academia	Chaoyue Liu, Dmitriy Drusvyatskiy, Yian Ma, Damek Davis**, and Mikhail Belkin Halicio glu Data Science Institute, University of California San Diego Mathematics Department, University of Washington **School of of Operations Research and Information Engineering, Cornell University
Pseudocode	Yes	Algorithm 1 SGD(w0, η, T)
Open Source Code	No	The paper does not provide an explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	Figure 1: Convergence plot of SGD when training a fully connected neural network with 3 hidden layers and 1000 neurons in each on MNIST (left) and a Res Net-28 on CIFAR-10 (right). We conduct the experiments on two datasets, MNIST and CIFAR-10.
Dataset Splits	No	The paper mentions total image counts for MNIST (60k) and CIFAR-10 (60k) but does not provide explicit training/validation/test split percentages or sample counts, nor does it refer to predefined standard splits for reproduction beyond mentioning the datasets themselves.
Hardware Specification	Yes	Specifically, we used the resources from SDSC Expanse GPU compute nodes, and NCSA Delta system, via allocations TG-CIS220009.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies such as programming languages, libraries, or frameworks used in the experiments.
Experiment Setup	Yes	We train a fully-connected neural network on the MNIST dataset. The network has 4 hidden layers, each with 1024 neurons. We optimize the MSE loss using SGD with a batch size 512 and a learning rate 0.5. The training was run over 1k epochs, and the ratio E[ ℓ(w, z) 2]/ L(w) 2 is evaluated every 100 epochs.