reproducibilityindex.ai

The staircase property: How hierarchical structure can guide deep learning

Authors: Emmanuel Abbe, Enric Boix-Adsera, Matthew S Brennan, Guy Bresler, Dheeraj Nagaraj

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We further back our theoretical results with experiments showing that staircase functions are learnable by more standard Res Net architectures with stochastic gradient descent. Both the theoretical and experimental results support the fact that the staircase property has a role to play in understanding the capabilities of gradient-based learning on regular networks, in contrast to general polynomial-size networks that can emulate any Statistical Query or PAC algorithm, as recently shown.
Researcher Affiliation	Academia	Emmanuel Abbe EPFL emmanuel.abbe@epfl.ch Enric Boix-Adsera MIT eboix@mit.edu Matthew Brennan Guy Bresler MIT guy@mit.edu Dheeraj Nagaraj MIT dheeraj@mit.edu
Pseudocode	Yes	Algorithm 1: TRAINNETWORKLAYERWISE Input: Sample access to the distribution {(x, g(x))}x { 1,1}n. Hyperparameters W, L, p1, p2, λ1, λ2, , B, stop, , . Output: Trained parameters of neural network after training layer-wise. Algorithm 2: TRAINNEURON(v, w0; λ1, λ2, , B, stop, , ) Algorithm 3: NEURONSGD(v, w0; λ1, λ2, , B, stop, , )
Open Source Code	Yes	3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See supplemental material.
Open Datasets	No	The paper focuses on learning mathematically defined 'staircase functions' and does not use or provide access information for a publicly available dataset in the conventional sense (e.g., CIFAR-10, ImageNet).
Dataset Splits	No	The paper works with synthetically generated functions and does not explicitly describe dataset splits (e.g., percentages, sample counts) for training, validation, or testing.
Hardware Specification	No	The paper mentions compute resources are detailed in supplemental material but does not provide specific hardware details (e.g., GPU/CPU models, memory) in the main body.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9) in the main text.
Experiment Setup	Yes	Figure 2: Comparison between training χ1:10 and S10 with n = 30 on the same 5-layer Re LU Res Net of width 40. Training is SGD with constant step size on the square loss.