The Benefits of Implicit Regularization from SGD in Least Squares Problems

Authors: Difan Zou, Jingfeng Wu, Vladimir Braverman, Quanquan Gu, Dean P. Foster, Sham Kakade

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The empirical observations are pretty consistent with our theoretical findings and again demonstrate the benefit of the implicit regularization of SGD. Experiments. We perform experiments on Gaussian least square problem.
Researcher Affiliation Collaboration Difan Zou University of California, Los Angeles knowzou@cs.ucla.edu Jingfeng Wu Johns Hopkins University uuujf@jhu.edu Vladimir Braverman Johns Hopkins University vova@cs.jhu.edu Quanquan Gu University of California, Los Angeles qgu@cs.ucla.edu Dean P. Foster Amazon dean@foster.net Sham M. Kakade University of Washington & Microsoft Research sham@cs.washington.edu
Pseudocode No The information is insufficient. The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The information is insufficient. The reproducibility checklist explicitly states 'N/A' for including code needed to reproduce the results, and no other statement or link for open-source code is provided.
Open Datasets No The information is insufficient. The paper describes generating synthetic data based on specified distributions and parameters (e.g., 'Gaussian least squares problems', 'one-hot data distribution') but does not refer to a pre-existing, publicly available dataset with concrete access information (link, DOI, formal citation).
Dataset Splits No The information is insufficient. The paper does not specify training, validation, or test dataset splits (e.g., percentages, sample counts, or predefined splits).
Hardware Specification No The information is insufficient. The reproducibility checklist states 'N/A' for hardware details, and the paper does not specify any particular GPU or CPU models, memory, or other hardware used for experiments.
Software Dependencies No The information is insufficient. The paper does not provide specific software names with version numbers for reproducibility, such as programming languages, libraries, or frameworks.
Experiment Setup Yes We perform experiments on Gaussian least square problem. ... The problem dimension is d = 200 and the variance of model noise is σ2 = 1. We consider 6 problem instances, which are the combinations of 2 different covariance matrices H: λi = i 1 and λi = i 2; and 3 different true model parameter vectors w : w [i] = 1, w [i] = i 1, and w [i] = i 10. ... where the hyperparameters (i.e., γ and λ) are fine-tuned to achieve the best performance.