Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Benign Overfitting of Constant-Stepsize SGD for Linear Regression

Authors: Difan Zou, Jingfeng Wu, Vladimir Braverman, Quanquan Gu, Sham M. Kakade

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on synthetic data corroborate our theoretical ﬁndings 1. ... 6. Experiments: In this section, we seek to empirically observe the benign overﬁtting phenomenon for SGD in Gaussian least square problems and verify our theorems on the generalization performance of SGD. ... Results are shown in Figure 2. We see that (1) SGD, with either iterate averaging or tail averaging, is comparable to ridge regression, and signiﬁcantly outperforms ordinary least square in some problem instances, and (2) SGD with tail averaging performs better than SGD with iterate averaging. These observations are consistent with our theoretical ﬁndings and demonstrate the beneﬁt of the implicit regularization from SGD.
Researcher Affiliation	Academia	Difan Zou EMAIL Department of Computer Science & Institute of Data Science The University of Hong Kong Jingfeng Wu EMAIL Simons Institute University of California, Berkeley Vladimir Braverman EMAIL Department of Computer Science Rice University Quanquan Gu EMAIL Department of Computer Science University of California, Los Angeles Sham M. Kakade EMAIL Department of Computer Science & Department of Statistics Harvard University
Pseudocode	No	The paper describes algorithms using mathematical equations, such as 'wt = wt 1 + γ (yt wt 1, xt ) xt, t = 1, . . . , N,' but it does not include any clearly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code	No	The paper does not contain an explicit statement about releasing source code or a direct link to a code repository for the described methodology.
Open Datasets	No	6. Experiments: We ﬁrst consider three over-parameterized linear regression problem instances with d = 2000 and the spectrum of H as λi = i 1, λi = i 1 log(i) 2, and λi = i 2, respectively. Besides, the ground truth is ﬁxed to be w [i] = i 1. ... Experimental results on synthetic data corroborate our theoretical ﬁndings. The paper uses synthetic data generated based on specified parameters rather than a publicly available dataset with concrete access information.
Dataset Splits	No	6. Experiments: The plots show the training and test risks achieved by SGD... SGD overﬁts the training sample... and generalizes well, which exhibits the benign overﬁtting phenomenon. The paper mentions 'training sample' and 'test sample' for its synthetic data experiments but does not provide specific split percentages, sample counts, or explicit methodology for how these splits were created (e.g., random seed, ratio).
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory amounts) used for conducting the experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions) that were used for the experiments.
Experiment Setup	No	6. Experiments: We consider three over-parameterized linear regression problem instances with d = 2000 and the spectrum of H as λi = i 1, λi = i 1 log(i) 2, and λi = i 2, respectively. Besides, the ground truth is ﬁxed to be w [i] = i 1. ... The problem dimension is d = 2000 and the variance of model noise is σ2 = 1 (hence the Bayes risk is 1). ... the hyperparameters (i.e., γ for SGD and λ for ridge regression) are ﬁne-tuned to achieve the best performance. While the paper describes the problem parameters and that hyperparameters were tuned, it does not provide the concrete values of the tuned hyperparameters (e.g., the specific stepsize γ used in experiments) or other training configurations.