Neural Networks Should Be Wide Enough to Learn Disconnected Decision Regions

Authors: Quynh Nguyen, Mahesh Chandra Mukkamala, Matthias Hein

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show the training error and the decision regions of trained network in Figure 4. The grid size in each case of Figure 4 has been manually chosen so that one can see clearly the connected/disconnected components in the decision regions. First, we observe that for two hidden units (n1 = 2), the network satisfies the condition of Theorem 3.10 and thus can only learn connected regions, which one can also clearly see in the figure, where one basically gets a linear separator. However, for three hidden units (n1 = 3), one can see that the network can produce disconnected decision regions, which shows that both our Theorems 3.10 and 3.11 are tight, in the sense that width d + 1 is already sufficient to produce disconnected components, whereas the results say that for width less than d + 1 the decision regions have to be connected.
Researcher Affiliation Academia 1Department of Mathematics and Computer Science, Saarland University, Germany 2University of T ubingen, Germany.
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any concrete access information for open-source code related to the methodology.
Open Datasets Yes We use a single image of digit 1 from the MNIST dataset to create a new artificial dataset... In Figure 7, we show another similar experiment on MNIST dataset, but now for all the 10 image classes.
Dataset Splits No The paper mentions '2000 training images' and discusses training error, but does not specify validation splits or percentages for the datasets used.
Hardware Specification No The paper does not provide any specific details about the hardware used for running its experiments.
Software Dependencies No The paper mentions methods like 'leaky Re LU', 'SGD', and 'cross-entropy loss', but does not specify any software packages or libraries with version numbers (e.g., 'PyTorch 1.9', 'TensorFlow 2.0') that were used in their experiments.
Experiment Setup Yes We then train this network by using SGD with momentum for 1000 epochs and learning rate 0.1 and reduce the it by a factor of 2 after every 50 epochs.