Generalization Bounds for Stochastic Gradient Descent via Localized $\varepsilon$-Covers
Authors: Sejun Park, Umut Simsekli, Murat A. Erdogdu
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we propose a new covering technique localized for the trajectories of SGD. This localization provides an algorithm-specific complexity measured by the covering number, which can have dimension-independent cardinality in contrast to standard uniform covering arguments that result in exponential dimension dependency. Based on this localized construction, we show that if the objective function is a finite perturbation of a piecewise strongly convex and smooth function with P pieces, i.e. non-convex and non-smooth in general, the generalization error can be upper bounded by O( p(log n log(n P))/n), where n is the number of data samples. In particular, this rate is independent of dimension and does not require early stopping and decaying step size. Finally, we employ these results in various contexts and derive generalization bounds for multi-index linear models, multi-class support vector machines, and K-means clustering for both hard and soft label setups, improving the known state-of-the-art rates. |
| Researcher Affiliation | Academia | Sejun Park Korea University sejun.park000@gmail.com Umut Sim sekli DI ENS, Ecole Normale Supérieure, Université PSL, CNRS, INRIA umut.simsekli@inria.fr Murat A. Erdogdu University of Toronto & Vector Institute erdogdu@cs.toronto.edu |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | No | The paper focuses on theoretical analysis and does not involve empirical studies with specific datasets. The ethics statement indicates N/A for experiments. |
| Dataset Splits | No | The paper does not provide specific dataset split information as it is a theoretical work. |
| Hardware Specification | No | The paper is theoretical and does not involve running experiments on specific hardware. The ethics statement indicates N/A for experiments. |
| Software Dependencies | No | The paper is theoretical and does not mention any software dependencies with version numbers. The ethics statement indicates N/A for experiments. |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup with hyperparameters or training settings. The ethics statement indicates N/A for experiments. |