Computational Complexity of Learning Neural Networks: Smoothness and Degeneracy

Authors: Amit Daniely, Nati Srebro, Gal Vardi

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We show that learning depth-3 Re LU networks under the Gaussian input distribution is hard even in the smoothed-analysis framework, where a random noise is added to the network s parameters. It implies that learning depth-3 Re LU networks under the Gaussian distribution is hard even if the weight matrices are non-degenerate. Moreover, we consider depth-2 networks, and show hardness of learning in the smoothed-analysis framework, where both the network parameters and the input distribution are smoothed. Our hardness results are under a well-studied assumption on the existence of local pseudorandom generators.
Researcher Affiliation Collaboration Amit Daniely Hebrew University and Google amit.daniely@mail.huji.ac.il Nathan Srebro TTI-Chicago nati@ttic.edu Gal Vardi TTI-Chicago and Hebrew University galvardi@ttic.edu
Pseudocode No No pseudocode or algorithm blocks are explicitly presented or labeled in the paper.
Open Source Code No The paper does not provide any statement or link regarding the availability of open-source code for the methodology described.
Open Datasets No The paper describes theoretical input distributions (e.g., Gaussian, Bernoulli) within a learning framework but does not mention or provide access information for any publicly available or open dataset used for training.
Dataset Splits No This paper is theoretical and does not describe experiments with data splits for training, validation, or testing.
Hardware Specification No The paper does not mention any specific hardware used for running experiments, as it is a theoretical work.
Software Dependencies No The paper does not specify any software dependencies with version numbers, as it is a theoretical work and does not describe an implementation.
Experiment Setup No The paper does not provide details on experimental setup, hyperparameters, or training configurations, as it is a theoretical work.