Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses

Authors: Raef Bassily, Vitaly Feldman, Cristóbal Guzmán, Kunal Talwar

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical Our work is the first to address uniform stability of SGD on nonsmooth convex losses. Specifically, we provide sharp upper and lower bounds for several forms of SGD and full-batch GD on arbitrary Lipschitz nonsmooth convex losses. Our lower bounds show that, in the nonsmooth case, (S)GD can be inherently less stable than in the smooth case. On the other hand, our upper bounds show that (S)GD is sufficiently stable for deriving new and useful bounds on generalization error. Most notably, we obtain the first dimension-independent generalization bounds for multi-pass SGD in the nonsmooth case.
Researcher Affiliation Collaboration Raef Bassily Department of Computer Science & Engineering The Ohio State University. bassily.1@osu.edu Vitaly Feldman Apple Cristóbal Guzmán Pontificia Universidad Católica de Chile Institute for Mathematical and Computational Engineering ANID Millennium Science Initiative Program Millennium Nucleus Center for the Discovery of Structures in Complex Data crguzmanp@mat.cl Kunal Talwar Apple ktalwar@apple.com
Pseudocode Yes Algorithm 1 Ar SGD: Sampling with replacement SGD
Open Source Code No The paper does not provide any explicit statements or links indicating that source code for the described methodology is publicly available.
Open Datasets No The paper is theoretical and uses the concept of a 'dataset' in a general sense (S = (z1, ..., zn)) for theoretical analysis, but does not mention or provide access information for any specific publicly available datasets for training purposes.
Dataset Splits No The paper is theoretical and does not describe practical experiment setups, including data splits for training, validation, or testing.
Hardware Specification No This is a theoretical paper providing mathematical analyses and does not describe computational experiments or specify any hardware used.
Software Dependencies No This is a theoretical paper and does not mention any specific software dependencies or versions required for implementation or reproduction.
Experiment Setup No This is a theoretical paper presenting algorithms and proofs, and therefore does not include details on experimental setup such as hyperparameters or training configurations.