Benign Underfitting of Stochastic Gradient Descent

Authors: Tomer Koren, Roi Livni, Yishay Mansour, Uri Sherman

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We study to what extent may stochastic gradient descent (SGD) be understood as a conventional learning rule that achieves generalization performance by obtaining a good fit to training data. We consider the fundamental stochastic convex optimization framework, where (one pass, without-replacement) SGD is classically known to minimize the population risk at rate 𝑂(1/ 𝑛), and prove that, surprisingly, there exist problem instances where the SGD solution exhibits both empirical risk and generalization gap of Ω(1).
Researcher Affiliation Collaboration Tomer Koren Blavatnik School of Computer Science Tel Aviv University and Google Research tkoren@tauex.tau.ac.il Roi Livni Department of Electrical Engineering Tel Aviv University rlivni@tauex.tau.ac.il Yishay Mansour Blavatnik School of Computer Science Tel Aviv University and Google Research mansour.yishay@gmail.com Uri Sherman Blavatnik School of Computer Science Tel Aviv University urisherman@mail.tau.ac.il
Pseudocode No The paper describes the SGD update rule in text rather than providing structured pseudocode or an algorithm block: 'Throughout most of the paper, we consider one-pass projected SGD over 𝑆; initialize at 𝑤1 𝑊; for 𝑡= 2, . . . , 𝑛: 𝑤𝑡+1 Π𝑊(𝑤𝑡 η𝑔𝑡) , with 𝑔𝑡 𝑓(𝑤𝑡; 𝑧𝑡),'
Open Source Code No The paper does not provide concrete access to source code. In the checklist, it states: '3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [N/A]'
Open Datasets No The paper is theoretical and constructs problem instances for analysis; it does not use or provide access information for an empirical dataset for training. The checklist for experimental details confirms this with '[N/A]'.
Dataset Splits No The paper is theoretical and does not involve empirical validation splits for a dataset. The checklist for experimental details confirms this with '[N/A]'.
Hardware Specification No The paper is theoretical and does not describe the hardware used for experiments. The checklist explicitly states: '3. If you ran experiments... (d) Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [N/A]'
Software Dependencies No The paper is theoretical and does not describe software dependencies with specific version numbers. The checklist for experimental details confirms this with '[N/A]'.
Experiment Setup No The paper is theoretical and does not describe an experimental setup with specific details like hyperparameters or training configurations. The checklist for experimental details confirms this with '[N/A]'.