Benign Underfitting of Stochastic Gradient Descent
Authors: Tomer Koren, Roi Livni, Yishay Mansour, Uri Sherman
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study to what extent may stochastic gradient descent (SGD) be understood as a conventional learning rule that achieves generalization performance by obtaining a good fit to training data. We consider the fundamental stochastic convex optimization framework, where (one pass, without-replacement) SGD is classically known to minimize the population risk at rate 𝑂(1/ 𝑛), and prove that, surprisingly, there exist problem instances where the SGD solution exhibits both empirical risk and generalization gap of Ω(1). |
| Researcher Affiliation | Collaboration | Tomer Koren Blavatnik School of Computer Science Tel Aviv University and Google Research tkoren@tauex.tau.ac.il Roi Livni Department of Electrical Engineering Tel Aviv University rlivni@tauex.tau.ac.il Yishay Mansour Blavatnik School of Computer Science Tel Aviv University and Google Research mansour.yishay@gmail.com Uri Sherman Blavatnik School of Computer Science Tel Aviv University urisherman@mail.tau.ac.il |
| Pseudocode | No | The paper describes the SGD update rule in text rather than providing structured pseudocode or an algorithm block: 'Throughout most of the paper, we consider one-pass projected SGD over 𝑆; initialize at 𝑤1 𝑊; for 𝑡= 2, . . . , 𝑛: 𝑤𝑡+1 Π𝑊(𝑤𝑡 η𝑔𝑡) , with 𝑔𝑡 𝑓(𝑤𝑡; 𝑧𝑡),' |
| Open Source Code | No | The paper does not provide concrete access to source code. In the checklist, it states: '3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [N/A]' |
| Open Datasets | No | The paper is theoretical and constructs problem instances for analysis; it does not use or provide access information for an empirical dataset for training. The checklist for experimental details confirms this with '[N/A]'. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical validation splits for a dataset. The checklist for experimental details confirms this with '[N/A]'. |
| Hardware Specification | No | The paper is theoretical and does not describe the hardware used for experiments. The checklist explicitly states: '3. If you ran experiments... (d) Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [N/A]' |
| Software Dependencies | No | The paper is theoretical and does not describe software dependencies with specific version numbers. The checklist for experimental details confirms this with '[N/A]'. |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup with specific details like hyperparameters or training configurations. The checklist for experimental details confirms this with '[N/A]'. |