Generalization Bounds for Stochastic Gradient Descent via Localized $\varepsilon$-Covers

Authors: Sejun Park, Umut Simsekli, Murat A. Erdogdu

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this paper, we propose a new covering technique localized for the trajectories of SGD. This localization provides an algorithm-specific complexity measured by the covering number, which can have dimension-independent cardinality in contrast to standard uniform covering arguments that result in exponential dimension dependency. Based on this localized construction, we show that if the objective function is a finite perturbation of a piecewise strongly convex and smooth function with P pieces, i.e. non-convex and non-smooth in general, the generalization error can be upper bounded by O( p(log n log(n P))/n), where n is the number of data samples. In particular, this rate is independent of dimension and does not require early stopping and decaying step size. Finally, we employ these results in various contexts and derive generalization bounds for multi-index linear models, multi-class support vector machines, and K-means clustering for both hard and soft label setups, improving the known state-of-the-art rates.
Researcher Affiliation Academia Sejun Park Korea University sejun.park000@gmail.com Umut Sim sekli DI ENS, Ecole Normale Supérieure, Université PSL, CNRS, INRIA umut.simsekli@inria.fr Murat A. Erdogdu University of Toronto & Vector Institute erdogdu@cs.toronto.edu
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets No The paper focuses on theoretical analysis and does not involve empirical studies with specific datasets. The ethics statement indicates N/A for experiments.
Dataset Splits No The paper does not provide specific dataset split information as it is a theoretical work.
Hardware Specification No The paper is theoretical and does not involve running experiments on specific hardware. The ethics statement indicates N/A for experiments.
Software Dependencies No The paper is theoretical and does not mention any software dependencies with version numbers. The ethics statement indicates N/A for experiments.
Experiment Setup No The paper is theoretical and does not describe an experimental setup with hyperparameters or training settings. The ethics statement indicates N/A for experiments.