The staircase property: How hierarchical structure can guide deep learning

Authors: Emmanuel Abbe, Enric Boix-Adsera, Matthew S Brennan, Guy Bresler, Dheeraj Nagaraj

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We further back our theoretical results with experiments showing that staircase functions are learnable by more standard Res Net architectures with stochastic gradient descent. Both the theoretical and experimental results support the fact that the staircase property has a role to play in understanding the capabilities of gradient-based learning on regular networks, in contrast to general polynomial-size networks that can emulate any Statistical Query or PAC algorithm, as recently shown.
Researcher Affiliation Academia Emmanuel Abbe EPFL emmanuel.abbe@epfl.ch Enric Boix-Adsera MIT eboix@mit.edu Matthew Brennan Guy Bresler MIT guy@mit.edu Dheeraj Nagaraj MIT dheeraj@mit.edu
Pseudocode Yes Algorithm 1: TRAINNETWORKLAYERWISE Input: Sample access to the distribution {(x, g(x))}x { 1,1}n. Hyperparameters W, L, p1, p2, λ1, λ2, , B, stop, , . Output: Trained parameters of neural network after training layer-wise. Algorithm 2: TRAINNEURON(v, w0; λ1, λ2, , B, stop, , ) Algorithm 3: NEURONSGD(v, w0; λ1, λ2, , B, stop, , )
Open Source Code Yes 3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See supplemental material.
Open Datasets No The paper focuses on learning mathematically defined 'staircase functions' and does not use or provide access information for a publicly available dataset in the conventional sense (e.g., CIFAR-10, ImageNet).
Dataset Splits No The paper works with synthetically generated functions and does not explicitly describe dataset splits (e.g., percentages, sample counts) for training, validation, or testing.
Hardware Specification No The paper mentions compute resources are detailed in supplemental material but does not provide specific hardware details (e.g., GPU/CPU models, memory) in the main body.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9) in the main text.
Experiment Setup Yes Figure 2: Comparison between training χ1:10 and S10 with n = 30 on the same 5-layer Re LU Res Net of width 40. Training is SGD with constant step size on the square loss.