Most Neural Networks Are Almost Learnable
Authors: Amit Daniely, Nati Srebro, Gal Vardi
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We present a PTAS for learning random constant-depth networks. We show that for any fixed ϵ > 0 and depth i, there is a poly-time algorithm that for any distribution on d Sd 1 learns random Xavier networks of depth i, up to an additive error of ϵ. The algorithm runs in time and sample complexity of ( d)poly(ϵ 1), where d is the size of the network. For some cases of sigmoid and Re LU-like activations the bound can be improved to ( d)polylog(ϵ 1), resulting in a quasi-poly-time algorithm for learning constant depth random networks. The main technical idea in our work is that constant-depth random neural networks with Lipschitz activations can be approximated sufficiently well by low-degree polynomials. This result follows by analyzing the network obtained by replacing each activation function with its polynomial approximation using Hermite polynomials. |
| Researcher Affiliation | Collaboration | Amit Daniely Hebrew University and Google amit.daniely@mail.huji.ac.il Nathan Srebro TTI-Chicago nati@ttic.edu Gal Vardi TTI-Chicago and Hebrew University galvardi@ttic.edu |
| Pseudocode | No | The paper describes algorithms conceptually but does not include any structured pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not include an unambiguous statement that the authors are releasing their code for the work described, nor does it provide a direct link to a source-code repository. |
| Open Datasets | No | The paper discusses input distributions for theoretical analysis (e.g., 'distribution D on Sd 1') but does not specify or provide access information for a publicly available or open dataset used for training. |
| Dataset Splits | No | The paper is theoretical and does not describe experiments with specific datasets, therefore it does not provide training/test/validation dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not describe any specific hardware used for running experiments. |
| Software Dependencies | No | The paper is theoretical and does not mention specific software dependencies with version numbers needed to replicate any experiments. |
| Experiment Setup | No | The paper is theoretical and does not provide specific experimental setup details such as hyperparameter values or training configurations. |