Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses
Authors: Raef Bassily, Vitaly Feldman, Cristóbal Guzmán, Kunal Talwar
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Our work is the first to address uniform stability of SGD on nonsmooth convex losses. Specifically, we provide sharp upper and lower bounds for several forms of SGD and full-batch GD on arbitrary Lipschitz nonsmooth convex losses. Our lower bounds show that, in the nonsmooth case, (S)GD can be inherently less stable than in the smooth case. On the other hand, our upper bounds show that (S)GD is sufficiently stable for deriving new and useful bounds on generalization error. Most notably, we obtain the first dimension-independent generalization bounds for multi-pass SGD in the nonsmooth case. |
| Researcher Affiliation | Collaboration | Raef Bassily Department of Computer Science & Engineering The Ohio State University. bassily.1@osu.edu Vitaly Feldman Apple Cristóbal Guzmán Pontificia Universidad Católica de Chile Institute for Mathematical and Computational Engineering ANID Millennium Science Initiative Program Millennium Nucleus Center for the Discovery of Structures in Complex Data crguzmanp@mat.cl Kunal Talwar Apple ktalwar@apple.com |
| Pseudocode | Yes | Algorithm 1 Ar SGD: Sampling with replacement SGD |
| Open Source Code | No | The paper does not provide any explicit statements or links indicating that source code for the described methodology is publicly available. |
| Open Datasets | No | The paper is theoretical and uses the concept of a 'dataset' in a general sense (S = (z1, ..., zn)) for theoretical analysis, but does not mention or provide access information for any specific publicly available datasets for training purposes. |
| Dataset Splits | No | The paper is theoretical and does not describe practical experiment setups, including data splits for training, validation, or testing. |
| Hardware Specification | No | This is a theoretical paper providing mathematical analyses and does not describe computational experiments or specify any hardware used. |
| Software Dependencies | No | This is a theoretical paper and does not mention any specific software dependencies or versions required for implementation or reproduction. |
| Experiment Setup | No | This is a theoretical paper presenting algorithms and proofs, and therefore does not include details on experimental setup such as hyperparameters or training configurations. |