Most Neural Networks Are Almost Learnable

Authors: Amit Daniely, Nati Srebro, Gal Vardi

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We present a PTAS for learning random constant-depth networks. We show that for any fixed ϵ > 0 and depth i, there is a poly-time algorithm that for any distribution on d Sd 1 learns random Xavier networks of depth i, up to an additive error of ϵ. The algorithm runs in time and sample complexity of ( d)poly(ϵ 1), where d is the size of the network. For some cases of sigmoid and Re LU-like activations the bound can be improved to ( d)polylog(ϵ 1), resulting in a quasi-poly-time algorithm for learning constant depth random networks. The main technical idea in our work is that constant-depth random neural networks with Lipschitz activations can be approximated sufficiently well by low-degree polynomials. This result follows by analyzing the network obtained by replacing each activation function with its polynomial approximation using Hermite polynomials.
Researcher Affiliation Collaboration Amit Daniely Hebrew University and Google amit.daniely@mail.huji.ac.il Nathan Srebro TTI-Chicago nati@ttic.edu Gal Vardi TTI-Chicago and Hebrew University galvardi@ttic.edu
Pseudocode No The paper describes algorithms conceptually but does not include any structured pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not include an unambiguous statement that the authors are releasing their code for the work described, nor does it provide a direct link to a source-code repository.
Open Datasets No The paper discusses input distributions for theoretical analysis (e.g., 'distribution D on Sd 1') but does not specify or provide access information for a publicly available or open dataset used for training.
Dataset Splits No The paper is theoretical and does not describe experiments with specific datasets, therefore it does not provide training/test/validation dataset splits.
Hardware Specification No The paper is theoretical and does not describe any specific hardware used for running experiments.
Software Dependencies No The paper is theoretical and does not mention specific software dependencies with version numbers needed to replicate any experiments.
Experiment Setup No The paper is theoretical and does not provide specific experimental setup details such as hyperparameter values or training configurations.