Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
The staircase property: How hierarchical structure can guide deep learning
Authors: Emmanuel Abbe, Enric Boix-Adsera, Matthew S Brennan, Guy Bresler, Dheeraj Nagaraj
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We further back our theoretical results with experiments showing that staircase functions are learnable by more standard Res Net architectures with stochastic gradient descent. Both the theoretical and experimental results support the fact that the staircase property has a role to play in understanding the capabilities of gradient-based learning on regular networks, in contrast to general polynomial-size networks that can emulate any Statistical Query or PAC algorithm, as recently shown. |
| Researcher Affiliation | Academia | Emmanuel Abbe EPFL EMAIL Enric Boix-Adsera MIT EMAIL Matthew Brennan Guy Bresler MIT EMAIL Dheeraj Nagaraj MIT EMAIL |
| Pseudocode | Yes | Algorithm 1: TRAINNETWORKLAYERWISE Input: Sample access to the distribution {(x, g(x))}x { 1,1}n. Hyperparameters W, L, p1, p2, λ1, λ2, , B, stop, , . Output: Trained parameters of neural network after training layer-wise. Algorithm 2: TRAINNEURON(v, w0; λ1, λ2, , B, stop, , ) Algorithm 3: NEURONSGD(v, w0; λ1, λ2, , B, stop, , ) |
| Open Source Code | Yes | 3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See supplemental material. |
| Open Datasets | No | The paper focuses on learning mathematically defined 'staircase functions' and does not use or provide access information for a publicly available dataset in the conventional sense (e.g., CIFAR-10, ImageNet). |
| Dataset Splits | No | The paper works with synthetically generated functions and does not explicitly describe dataset splits (e.g., percentages, sample counts) for training, validation, or testing. |
| Hardware Specification | No | The paper mentions compute resources are detailed in supplemental material but does not provide specific hardware details (e.g., GPU/CPU models, memory) in the main body. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9) in the main text. |
| Experiment Setup | Yes | Figure 2: Comparison between training χ1:10 and S10 with n = 30 on the same 5-layer Re LU Res Net of width 40. Training is SGD with constant step size on the square loss. |