On the non-universality of deep learning: quantifying the cost of symmetry

Authors: Emmanuel Abbe, Enric Boix-Adsera

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We prove limitations on what neural networks trained by noisy gradient descent (GD) can efficiently learn. Our results apply whenever GD training is equivariant, which holds for many standard architectures and initializations. As applications, (i) we characterize the functions that fully-connected networks can weak-learn on the binary hypercube and unit sphere, demonstrating that depth-2 is as powerful as any other depth for this task; (ii) we extend the merged-staircase necessity result for learning with latent low-dimensional structure [ABM22] to beyond the meanfield regime. Under cryptographic assumptions, we also show hardness results for learning with fully-connected networks trained by stochastic gradient descent (SGD).
Researcher Affiliation Collaboration Emmanuel Abbe, Enric Boix-Adserà. We thank the Simons Foundation and the NSF for supporting us through the Collaboration on the Theoretical Foundations of Deep Learning (deepfoundations.ai). This work was done in part while E.B. was visiting the Simons Institute for the Theory of Computing and the Bernoulli Center at EPFL, and was generously supported by Apple with an AI/ML fellowship.
Pseudocode No No pseudocode or algorithm blocks are present in the paper. The methods are described mathematically and textually.
Open Source Code No No mention of open-source code for the described methodology or links to a code repository are present in the paper.
Open Datasets No The paper is theoretical and does not report on empirical experiments using specific datasets. It refers to abstract data distributions like P(X Y) and P(X R) for theoretical analysis, but not concrete, publicly accessible datasets for training.
Dataset Splits No The paper is theoretical and does not conduct experiments with data, thus no training/validation/test splits are provided.
Hardware Specification No The paper is theoretical and does not conduct experiments, therefore no hardware specifications are mentioned.
Software Dependencies No The paper is theoretical and does not conduct experiments, therefore no specific software dependencies with version numbers are mentioned.
Experiment Setup No The paper is theoretical and does not conduct experiments, therefore no specific experimental setup details or hyperparameters are provided.