Complexity of Linear Regions in Deep Networks

Authors: Boris Hanin, David Rolnick

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 3. Experiments We empirically verified our theorems and further examined how linear regions of a network change during training. All experiments below were performed with fully-connected networks, initialized with He normal weights (i.i.d. with variance 2/fan-in) and biases drawn i.i.d. normal with variance 10 6 (to prevent collapse of regions at initialization, which occurs when all biases are uniquely zero). Training was performed on the vectorized MNIST (input dimension 784) using the Adam optimizer at learning rate 10 3. All networks attain test accuracy in the range 95 98%.
Researcher Affiliation Collaboration Boris Hanin * 1 David Rolnick * 2 *Equal contribution 1Department of Mathematics, Texas A&M University and Facebook AI Research, New York 2University of Pennsylvania.
Pseudocode No The paper describes methods and experiments in prose and mathematical notation but does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statement about making its source code publicly available, nor does it provide a link to a code repository.
Open Datasets Yes Training was performed on the vectorized MNIST (input dimension 784)
Dataset Splits No The paper mentions 'Training was performed on the vectorized MNIST' but does not specify the exact training, validation, or test dataset splits (e.g., percentages or sample counts) needed for reproduction.
Hardware Specification No The paper describes the experimental setup regarding networks and initialization (e.g., 'fully-connected networks', 'He normal weights', 'Adam optimizer'), but it does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions the use of 'Adam optimizer' and 'Re LU activation' but does not provide specific version numbers for any software dependencies, libraries, or frameworks used in the experiments.
Experiment Setup Yes All experiments below were performed with fully-connected networks, initialized with He normal weights (i.i.d. with variance 2/fan-in) and biases drawn i.i.d. normal with variance 10 6 (to prevent collapse of regions at initialization, which occurs when all biases are uniquely zero). Training was performed on the vectorized MNIST (input dimension 784) using the Adam optimizer at learning rate 10 3.