On the Number of Linear Regions of Convolutional Neural Networks
Authors: Huan Xiong, Lei Huang, Mengyang Yu, Li Liu, Fan Zhu, Ling Shao
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically validate our results by randomly sampling data points from the input space and determining which linear regions they belong to by Definition 1. For a given CNN architecture, we initialize the parameters (weights and biases) based on the He initialization (He et al., 2015). Given the sampled weight, each data point in the input space is sampled from a normal distribution with mean 0 and standard deviation v. We use v ranging from {3, 5, 7, 9, 11, 13} and report the maximal number of linear regions from such v. We sample 2 109 data points in total, and for each data point, we determine which region it belongs to based on Definition 1 (for a new data point X0, we simply calculate the sign of z(X0, θ) for each neuron z and use it to determine whether X0 belongs to a new region). This sampling method may skip some regions. Thus, the number of linear regions obtained by sampling is usually smaller than the exact number. However, when the number of sampling points is large enough, we can usually find almost all the linear regions. For example, we use this sampling method to find all RN linear regions for one-layer CNNs N in Table 1, and find the number of regions between the lower and upper bounds for two-layer CNNs in Table 2. By these, we validate the correctness of our results. We provide the codes for the experiments in the Supplementary Material. |
| Researcher Affiliation | Academia | 1Mohamed bin Zayed University of Artificial Intelligence, UAE 2Inception Institute of Artificial Intelligence, Abu Dhabi, UAE. |
| Pseudocode | No | The paper presents mathematical formulas, theorems, and proofs but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | We provide the codes for the experiments in the Supplementary Material. |
| Open Datasets | No | For a given CNN architecture, we initialize the parameters (weights and biases) based on the He initialization (He et al., 2015). Given the sampled weight, each data point in the input space is sampled from a normal distribution with mean 0 and standard deviation v. We use v ranging from {3, 5, 7, 9, 11, 13} and report the maximal number of linear regions from such v. We sample 2 109 data points in total... The paper describes generating data points through sampling from a normal distribution, rather than using an existing public dataset. |
| Dataset Splits | No | The paper describes a sampling method to find linear regions for validation but does not specify traditional train/validation/test dataset splits with percentages or sample counts. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments, such as CPU or GPU models. |
| Software Dependencies | No | The paper mentions that "codes for the experiments" are provided, but it does not specify any software dependencies or their version numbers (e.g., programming languages, libraries, frameworks). |
| Experiment Setup | Yes | For a given CNN architecture, we initialize the parameters (weights and biases) based on the He initialization (He et al., 2015). Given the sampled weight, each data point in the input space is sampled from a normal distribution with mean 0 and standard deviation v. We use v ranging from {3, 5, 7, 9, 11, 13} and report the maximal number of linear regions from such v. We sample 2 109 data points in total... |