Eigenvalue Decay Implies Polynomial-Time Learnability for Neural Networks

Authors: Surbhi Goel, Adam Klivans

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this work we show that a natural distributional assumption corresponding to eigenvalue decay of the Gram matrix yields polynomial-time algorithms in the non-realizable setting for expressive classes of networks (e.g. feed-forward networks of Re LUs). We make no assumptions on the structure of the network or the labels. Given sufficiently-strong eigenvalue decay, we obtain fully-polynomial time algorithms in all the relevant parameters with respect to square-loss. This is the first purely distributional assumption that leads to polynomial-time algorithms for networks of Re LUs. Further, unlike prior distributional assumptions (e.g., the marginal distribution is Gaussian), eigenvalue decay has been observed in practice on common data sets.
Researcher Affiliation Academia Surbhi Goel Department of Computer Science University of Texas at Austin surbhi@cs.utexas.edu Adam Klivans Department of Computer Science University of Texas at Austin klivans@cs.utexas.edu
Pseudocode Yes Algorithm 1 Compressed Kernel Regression
Open Source Code No The paper does not provide explicit statements or links for open-source code for the described methodology.
Open Datasets No The paper is theoretical and does not describe training on a specific, publicly available dataset with access information. It discusses 'common data sets' in a general sense but no details for reproducibility.
Dataset Splits No The paper is theoretical and does not describe empirical experiments, thus no validation dataset splits are mentioned.
Hardware Specification No The paper does not mention any specific hardware used for running experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and does not describe empirical experiments; therefore, it does not provide details on experimental setup or hyperparameters.