Empirical Bounds on Linear Regions of Deep Rectifier Networks

Authors: Thiago Serra, Srikumar Ramalingam5628-5635

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We tested on rectifier networks trained on the MNIST benchmark dataset (Le Cun et al. 1998), consisting of 22 units distributed in two hidden layers and 10 output units, with 10 distinct networks for each distribution of units between the hidden layers. See Appendix G for more details about the networks and the implementation. For each size of parity constraints k, which we denote as XOR-k, we measure the time to find the smallest coefficients Hl i and Hl i for each unit along with the subsequent time of MIPBound (Algorithm 1). We let MIPBound run for enough steps to obtain a probability of 99.5% in case all tested constraints of a given size preserve the formulation feasible, and we report the largest lower bound with probability at least 95%. We also use the approach in (Serra, Tjandraatmadja, and Ramalingam 2018) to count the exact number of regions for benchmarking. Since counting can be faster than sampling for smaller sets, we define a DNN with η < 12 as small and large otherwise. We use Configuration Upper Bound (Configuration UB) for the bound in (Serra, Tjandraatmadja, and Ramalingam 2018). The upper bound from Theorem 2, which we denote as Empirical Upper Bound (Empirical UB), is computed at a small fraction of the time to obtain coefficients Hl i and Hl i for the lower bound. We denote as APP-k the average between the XOR-k Lower Bound (LB) and Empirical UB. Table 1 shows the gap closed by Empirical UB. Figure 2 (top) compares the bounds with the number of linear regions. Figure 2 (bottom) compares the time for exact counting and approximation. Figure 3 compares APP-k with the accuracy of networks not having particularly narrow layers, in which case the number of regions relates to network accuracy (Serra, Tjandraatmadja, and Ramalingam 2018).
Researcher Affiliation Academia Thiago Serra,1 Srikumar Ramalingam2 1Bucknell University, USA 2The University of Utah, USA thiago.serra@bucknell.edu, srikumar@cs.utah.edu
Pseudocode Yes Algorithm 1 MIPBound computes the probability of some lower bounds on the solutions of a formulation F with n binary variables by adding parity constraints of size k
Open Source Code No The paper states 'An extended version of this paper with appendices can be found at https://arxiv.org/abs/1810.03370.' This is a link to an arXiv preprint, not a direct link to source code or an explicit statement about code availability for the methodology described in the paper.
Open Datasets Yes We tested on rectifier networks trained on the MNIST benchmark dataset (Le Cun et al. 1998)
Dataset Splits No The paper mentions testing on the MNIST dataset but does not specify the exact training, validation, or test split percentages or sample counts.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, processor types, or memory used for running the experiments. It only mentions general aspects like 'networks trained on the MNIST benchmark dataset'.
Software Dependencies No The paper mentions 'See Appendix G for more details about the networks and the implementation' but does not explicitly list specific software dependencies with version numbers (e.g., 'Python 3.8', 'PyTorch 1.9') within the provided text.
Experiment Setup No The paper states 'See Appendix G for more details about the networks and the implementation' but does not provide specific experimental setup details such as hyperparameter values (learning rate, batch size, number of epochs) or training configurations within the main body of the text.