reproducibilityindex.ai

Tight Risk Bounds for Gradient Descent on Separable Data

Authors: Matan Schliserman, Tomer Koren

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We study the generalization properties of unregularized gradient methods applied to separable linear classification a setting that has received considerable attention since the pioneering work of Soudry et al. [14]. We establish tight upper and lower (population) risk bounds for gradient descent in this setting, for any smooth loss function, expressed in terms of its tail decay rate. Our bounds take the form Θ(𝑟2 ℓ,𝑇/𝛾2𝑇+ 𝑟2 ℓ,𝑇/𝛾2𝑛), where 𝑇is the number of gradient steps, 𝑛is size of the training set, 𝛾is the data margin, and 𝑟ℓ,𝑇is a complexity term that depends on the tail decay rate of the loss function (and on 𝑇). Our upper bound greatly improves the existing risk bounds due to Shamir [13] and Schliserman and Koren [12], that either applied to specific loss functions or imposed extraneous technical assumptions, and applies to virtually any convex and smooth loss function. Our risk lower bound is the first in this context and establish the tightness of our general upper bound for any given tail decay rate and in all parameter regimes. The proof technique used to show these results is also markedly simpler compared to previous work, and is straightforward to extend to other gradient methods; we illustrate this by providing analogous results for Stochastic Gradient Descent.
Researcher Affiliation	Collaboration	1Blavatnik School of Computer Science, Tel Aviv University 2Google Research, Tel Aviv
Pseudocode	No	The paper describes algorithms like Gradient Descent (GD) and Stochastic Gradient Descent (SGD) mathematically but does not include any pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any statement about making source code publicly available or a link to a code repository for the methodology described.
Open Datasets	No	The paper defines a theoretical distribution D from which training examples are drawn (S = {z1, ..., zn} which drawn i.i.d. from D) but does not mention the use of any specific publicly available dataset.
Dataset Splits	No	The paper uses the term
Hardware Specification	No	The paper does not mention any specific hardware used for computations or experiments. It is a theoretical paper focusing on mathematical bounds.
Software Dependencies	No	The paper does not specify any software dependencies or version numbers. It is a theoretical paper.
Experiment Setup	No	The paper is theoretical and focuses on mathematical bounds, not on empirical experiments. While it discusses properties of Gradient Descent like