Initialization-Dependent Sample Complexity of Linear Predictors and Neural Networks
Authors: Roey Magen, Ohad Shamir
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We provide several new results on the sample complexity of vector-valued linear predictors (parameterized by a matrix), and more generally neural networks. Focusing on size-independent bounds, where only the Frobenius norm distance of the parameters from some fixed reference matrix W0 is controlled, we show that the sample complexity behavior can be surprisingly different than what we may expect considering the well-studied setting of scalar-valued linear predictors. This also leads to new sample complexity bounds for feed-forward neural networks, tackling some open questions in the literature, and establishing a new convex linear prediction problem that is provably learnable without uniform convergence. |
| Researcher Affiliation | Academia | Roey Magen Weizmann Institute of Science roey.magen@weizmann.ac.il Ohad Shamir Weizmann Institute of Science ohad.shamir@weizmann.ac.il |
| Pseudocode | Yes | Algorithm 1 Stochastic Gradient Descent (SGD), with projection step and initialization at W0 |
| Open Source Code | No | The paper does not provide any specific links to open-source code or explicit statements about code availability for the described methodology. |
| Open Datasets | No | The paper discusses theoretical input domains such as 'inputs from {x Rd : ||x|| 1}' or 'inputs from {x Rd : ||x|| bx}', but does not refer to any specific, named, publicly available datasets used for training. |
| Dataset Splits | No | The paper is theoretical and focuses on mathematical bounds and proofs. It does not describe any specific dataset splits (training, validation, test) or mention the process of validation for empirical experiments. |
| Hardware Specification | No | The paper is theoretical and does not describe any hardware used to run experiments. |
| Software Dependencies | No | The paper is theoretical and does not describe any specific software dependencies or version numbers used for implementation or experiments. |
| Experiment Setup | No | The paper is theoretical and does not describe specific experimental setup details such as hyperparameters, training configurations, or system-level settings. |