Estimation and Comparison of Linear Regions for ReLU Networks

Authors: Yuan Wang

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide both theoretical and empirical evidence for the point of view that shallow networks tend to have higher complexity than deep ones when the total number of neurons is fixed. In the theoretical part, we prove that this is the case for networks whose neurons in the hidden layers are arranged in the forms of 1 2n, 2 n and n 2; in the empirical part, we implement an algorithm that precisely tracks (hence counts) all the linear regions, and run it on networks with various structures.
Researcher Affiliation Industry Yuan Wang Garena, Singapore yuanwang2011@outlook.com
Pseudocode No The paper describes the 'Algorithm of Finding Linear Regions' in Section 3 and illustrates a 'pipeline' in Figure 2, but it does not present a formal pseudocode block or algorithm listing.
Open Source Code No The paper mentions 'our code' when describing the algorithm's implementation but does not provide any concrete access information (link, repository, or statement of public release) for it.
Open Datasets Yes Here we do an experiment for the two-spiral data which is widely used to train networks of small scale (e.g. [Sopena et al., 1999], [Hunter et al., 2012]). We follow the Tensor Flow Playground [Google, 2016] to construct the two-spiral data as well as the input data
Dataset Splits Yes We take 500 points from each spiral and split the total data randomly into 900 training data and 100 test data.
Hardware Specification No The paper mentions running experiments on 'multiple machines' and 'a cluster of machines' but does not provide specific hardware details such as GPU or CPU models, or memory specifications.
Software Dependencies No The paper mentions using 'TensorFlow Playground' but does not specify its version or any other software dependencies with version numbers.
Experiment Setup Yes All the weights and biases follow the uniform distribution between 1 and 1.