Initialization-Dependent Sample Complexity of Linear Predictors and Neural Networks

Authors: Roey Magen, Ohad Shamir

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We provide several new results on the sample complexity of vector-valued linear predictors (parameterized by a matrix), and more generally neural networks. Focusing on size-independent bounds, where only the Frobenius norm distance of the parameters from some fixed reference matrix W0 is controlled, we show that the sample complexity behavior can be surprisingly different than what we may expect considering the well-studied setting of scalar-valued linear predictors. This also leads to new sample complexity bounds for feed-forward neural networks, tackling some open questions in the literature, and establishing a new convex linear prediction problem that is provably learnable without uniform convergence.
Researcher Affiliation Academia Roey Magen Weizmann Institute of Science roey.magen@weizmann.ac.il Ohad Shamir Weizmann Institute of Science ohad.shamir@weizmann.ac.il
Pseudocode Yes Algorithm 1 Stochastic Gradient Descent (SGD), with projection step and initialization at W0
Open Source Code No The paper does not provide any specific links to open-source code or explicit statements about code availability for the described methodology.
Open Datasets No The paper discusses theoretical input domains such as 'inputs from {x Rd : ||x|| 1}' or 'inputs from {x Rd : ||x|| bx}', but does not refer to any specific, named, publicly available datasets used for training.
Dataset Splits No The paper is theoretical and focuses on mathematical bounds and proofs. It does not describe any specific dataset splits (training, validation, test) or mention the process of validation for empirical experiments.
Hardware Specification No The paper is theoretical and does not describe any hardware used to run experiments.
Software Dependencies No The paper is theoretical and does not describe any specific software dependencies or version numbers used for implementation or experiments.
Experiment Setup No The paper is theoretical and does not describe specific experimental setup details such as hyperparameters, training configurations, or system-level settings.