Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Benign Overfitting of Constant-Stepsize SGD for Linear Regression
Authors: Difan Zou, Jingfeng Wu, Vladimir Braverman, Quanquan Gu, Sham M. Kakade
JMLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on synthetic data corroborate our theoretical findings 1. ... 6. Experiments: In this section, we seek to empirically observe the benign overfitting phenomenon for SGD in Gaussian least square problems and verify our theorems on the generalization performance of SGD. ... Results are shown in Figure 2. We see that (1) SGD, with either iterate averaging or tail averaging, is comparable to ridge regression, and significantly outperforms ordinary least square in some problem instances, and (2) SGD with tail averaging performs better than SGD with iterate averaging. These observations are consistent with our theoretical findings and demonstrate the benefit of the implicit regularization from SGD. |
| Researcher Affiliation | Academia | Difan Zou EMAIL Department of Computer Science & Institute of Data Science The University of Hong Kong Jingfeng Wu EMAIL Simons Institute University of California, Berkeley Vladimir Braverman EMAIL Department of Computer Science Rice University Quanquan Gu EMAIL Department of Computer Science University of California, Los Angeles Sham M. Kakade EMAIL Department of Computer Science & Department of Statistics Harvard University |
| Pseudocode | No | The paper describes algorithms using mathematical equations, such as 'wt = wt 1 + γ (yt wt 1, xt ) xt, t = 1, . . . , N,' but it does not include any clearly labeled pseudocode or algorithm blocks with structured steps. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code or a direct link to a code repository for the described methodology. |
| Open Datasets | No | 6. Experiments: We first consider three over-parameterized linear regression problem instances with d = 2000 and the spectrum of H as λi = i 1, λi = i 1 log(i) 2, and λi = i 2, respectively. Besides, the ground truth is fixed to be w [i] = i 1. ... Experimental results on synthetic data corroborate our theoretical findings. The paper uses synthetic data generated based on specified parameters rather than a publicly available dataset with concrete access information. |
| Dataset Splits | No | 6. Experiments: The plots show the training and test risks achieved by SGD... SGD overfits the training sample... and generalizes well, which exhibits the benign overfitting phenomenon. The paper mentions 'training sample' and 'test sample' for its synthetic data experiments but does not provide specific split percentages, sample counts, or explicit methodology for how these splits were created (e.g., random seed, ratio). |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory amounts) used for conducting the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions) that were used for the experiments. |
| Experiment Setup | No | 6. Experiments: We consider three over-parameterized linear regression problem instances with d = 2000 and the spectrum of H as λi = i 1, λi = i 1 log(i) 2, and λi = i 2, respectively. Besides, the ground truth is fixed to be w [i] = i 1. ... The problem dimension is d = 2000 and the variance of model noise is σ2 = 1 (hence the Bayes risk is 1). ... the hyperparameters (i.e., γ for SGD and λ for ridge regression) are fine-tuned to achieve the best performance. While the paper describes the problem parameters and that hyperparameters were tuned, it does not provide the concrete values of the tuned hyperparameters (e.g., the specific stepsize γ used in experiments) or other training configurations. |