On the non-universality of deep learning: quantifying the cost of symmetry
Authors: Emmanuel Abbe, Enric Boix-Adsera
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We prove limitations on what neural networks trained by noisy gradient descent (GD) can efficiently learn. Our results apply whenever GD training is equivariant, which holds for many standard architectures and initializations. As applications, (i) we characterize the functions that fully-connected networks can weak-learn on the binary hypercube and unit sphere, demonstrating that depth-2 is as powerful as any other depth for this task; (ii) we extend the merged-staircase necessity result for learning with latent low-dimensional structure [ABM22] to beyond the meanfield regime. Under cryptographic assumptions, we also show hardness results for learning with fully-connected networks trained by stochastic gradient descent (SGD). |
| Researcher Affiliation | Collaboration | Emmanuel Abbe, Enric Boix-Adserà. We thank the Simons Foundation and the NSF for supporting us through the Collaboration on the Theoretical Foundations of Deep Learning (deepfoundations.ai). This work was done in part while E.B. was visiting the Simons Institute for the Theory of Computing and the Bernoulli Center at EPFL, and was generously supported by Apple with an AI/ML fellowship. |
| Pseudocode | No | No pseudocode or algorithm blocks are present in the paper. The methods are described mathematically and textually. |
| Open Source Code | No | No mention of open-source code for the described methodology or links to a code repository are present in the paper. |
| Open Datasets | No | The paper is theoretical and does not report on empirical experiments using specific datasets. It refers to abstract data distributions like P(X Y) and P(X R) for theoretical analysis, but not concrete, publicly accessible datasets for training. |
| Dataset Splits | No | The paper is theoretical and does not conduct experiments with data, thus no training/validation/test splits are provided. |
| Hardware Specification | No | The paper is theoretical and does not conduct experiments, therefore no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not conduct experiments, therefore no specific software dependencies with version numbers are mentioned. |
| Experiment Setup | No | The paper is theoretical and does not conduct experiments, therefore no specific experimental setup details or hyperparameters are provided. |