Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
On the non-universality of deep learning: quantifying the cost of symmetry
Authors: Emmanuel Abbe, Enric Boix-Adsera
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We prove limitations on what neural networks trained by noisy gradient descent (GD) can efficiently learn. Our results apply whenever GD training is equivariant, which holds for many standard architectures and initializations. As applications, (i) we characterize the functions that fully-connected networks can weak-learn on the binary hypercube and unit sphere, demonstrating that depth-2 is as powerful as any other depth for this task; (ii) we extend the merged-staircase necessity result for learning with latent low-dimensional structure [ABM22] to beyond the meanfield regime. Under cryptographic assumptions, we also show hardness results for learning with fully-connected networks trained by stochastic gradient descent (SGD). |
| Researcher Affiliation | Collaboration | Emmanuel Abbe, Enric Boix-Adserà. We thank the Simons Foundation and the NSF for supporting us through the Collaboration on the Theoretical Foundations of Deep Learning (deepfoundations.ai). This work was done in part while E.B. was visiting the Simons Institute for the Theory of Computing and the Bernoulli Center at EPFL, and was generously supported by Apple with an AI/ML fellowship. |
| Pseudocode | No | No pseudocode or algorithm blocks are present in the paper. The methods are described mathematically and textually. |
| Open Source Code | No | No mention of open-source code for the described methodology or links to a code repository are present in the paper. |
| Open Datasets | No | The paper is theoretical and does not report on empirical experiments using specific datasets. It refers to abstract data distributions like P(X Y) and P(X R) for theoretical analysis, but not concrete, publicly accessible datasets for training. |
| Dataset Splits | No | The paper is theoretical and does not conduct experiments with data, thus no training/validation/test splits are provided. |
| Hardware Specification | No | The paper is theoretical and does not conduct experiments, therefore no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not conduct experiments, therefore no specific software dependencies with version numbers are mentioned. |
| Experiment Setup | No | The paper is theoretical and does not conduct experiments, therefore no specific experimental setup details or hyperparameters are provided. |