reproducibilityindex.ai

On the non-universality of deep learning: quantifying the cost of symmetry

Authors: Emmanuel Abbe, Enric Boix-Adsera

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We prove limitations on what neural networks trained by noisy gradient descent (GD) can efﬁciently learn. Our results apply whenever GD training is equivariant, which holds for many standard architectures and initializations. As applications, (i) we characterize the functions that fully-connected networks can weak-learn on the binary hypercube and unit sphere, demonstrating that depth-2 is as powerful as any other depth for this task; (ii) we extend the merged-staircase necessity result for learning with latent low-dimensional structure [ABM22] to beyond the meanﬁeld regime. Under cryptographic assumptions, we also show hardness results for learning with fully-connected networks trained by stochastic gradient descent (SGD).
Researcher Affiliation	Collaboration	Emmanuel Abbe, Enric Boix-Adserà. We thank the Simons Foundation and the NSF for supporting us through the Collaboration on the Theoretical Foundations of Deep Learning (deepfoundations.ai). This work was done in part while E.B. was visiting the Simons Institute for the Theory of Computing and the Bernoulli Center at EPFL, and was generously supported by Apple with an AI/ML fellowship.
Pseudocode	No	No pseudocode or algorithm blocks are present in the paper. The methods are described mathematically and textually.
Open Source Code	No	No mention of open-source code for the described methodology or links to a code repository are present in the paper.
Open Datasets	No	The paper is theoretical and does not report on empirical experiments using specific datasets. It refers to abstract data distributions like P(X Y) and P(X R) for theoretical analysis, but not concrete, publicly accessible datasets for training.
Dataset Splits	No	The paper is theoretical and does not conduct experiments with data, thus no training/validation/test splits are provided.
Hardware Specification	No	The paper is theoretical and does not conduct experiments, therefore no hardware specifications are mentioned.
Software Dependencies	No	The paper is theoretical and does not conduct experiments, therefore no specific software dependencies with version numbers are mentioned.
Experiment Setup	No	The paper is theoretical and does not conduct experiments, therefore no specific experimental setup details or hyperparameters are provided.