The staircase property: How hierarchical structure can guide deep learning
Authors: Emmanuel Abbe, Enric Boix-Adsera, Matthew S Brennan, Guy Bresler, Dheeraj Nagaraj
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We further back our theoretical results with experiments showing that staircase functions are learnable by more standard Res Net architectures with stochastic gradient descent. Both the theoretical and experimental results support the fact that the staircase property has a role to play in understanding the capabilities of gradient-based learning on regular networks, in contrast to general polynomial-size networks that can emulate any Statistical Query or PAC algorithm, as recently shown. |
| Researcher Affiliation | Academia | Emmanuel Abbe EPFL emmanuel.abbe@epfl.ch Enric Boix-Adsera MIT eboix@mit.edu Matthew Brennan Guy Bresler MIT guy@mit.edu Dheeraj Nagaraj MIT dheeraj@mit.edu |
| Pseudocode | Yes | Algorithm 1: TRAINNETWORKLAYERWISE Input: Sample access to the distribution {(x, g(x))}x { 1,1}n. Hyperparameters W, L, p1, p2, λ1, λ2, , B, stop, , . Output: Trained parameters of neural network after training layer-wise. Algorithm 2: TRAINNEURON(v, w0; λ1, λ2, , B, stop, , ) Algorithm 3: NEURONSGD(v, w0; λ1, λ2, , B, stop, , ) |
| Open Source Code | Yes | 3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See supplemental material. |
| Open Datasets | No | The paper focuses on learning mathematically defined 'staircase functions' and does not use or provide access information for a publicly available dataset in the conventional sense (e.g., CIFAR-10, ImageNet). |
| Dataset Splits | No | The paper works with synthetically generated functions and does not explicitly describe dataset splits (e.g., percentages, sample counts) for training, validation, or testing. |
| Hardware Specification | No | The paper mentions compute resources are detailed in supplemental material but does not provide specific hardware details (e.g., GPU/CPU models, memory) in the main body. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9) in the main text. |
| Experiment Setup | Yes | Figure 2: Comparison between training χ1:10 and S10 with n = 30 on the same 5-layer Re LU Res Net of width 40. Training is SGD with constant step size on the square loss. |