Deep ReLU Networks Have Surprisingly Few Activation Patterns

Authors: Boris Hanin, David Rolnick

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide theoretical and empirical analyses of the typical complexity of the function computed by a Re LU network N. We show empirically that this bound, which is independent of the depth, is tight both at initialization and during training, even on memorization tasks that should maximize the number of activation patterns.
Researcher Affiliation Collaboration Boris Hanin Facebook AI Research Texas A&M University bhanin@math.tamu.edu David Rolnick University of Pennsylvania Philadelphia, PA USA drolnick@seas.upenn.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any concrete access to source code for the methodology described.
Open Datasets Yes The average number of activation regions in a 2D cross-section of input space, for fully connected networks of various architectures training on MNIST.
Dataset Splits No The paper mentions training on datasets like MNIST and random 2D points, but it does not specify any training/validation/test splits or percentages.
Hardware Specification No The paper does not provide any specific hardware details (like GPU/CPU models or memory) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library names with version numbers) needed to replicate the experiments.
Experiment Setup Yes Depth 3, width 32 network trained on MNIST with varying levels of label corruption. The number of regions predicted by Theorem 5 for such a network is 962/2! = 4608. Learning rate 10-3, which gives the maximum number of regions, is the learning rate in all other experiments, while 10-2 is too large and causes learning to fail.