Neural Networks Should Be Wide Enough to Learn Disconnected Decision Regions
Authors: Quynh Nguyen, Mahesh Chandra Mukkamala, Matthias Hein
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show the training error and the decision regions of trained network in Figure 4. The grid size in each case of Figure 4 has been manually chosen so that one can see clearly the connected/disconnected components in the decision regions. First, we observe that for two hidden units (n1 = 2), the network satisfies the condition of Theorem 3.10 and thus can only learn connected regions, which one can also clearly see in the figure, where one basically gets a linear separator. However, for three hidden units (n1 = 3), one can see that the network can produce disconnected decision regions, which shows that both our Theorems 3.10 and 3.11 are tight, in the sense that width d + 1 is already sufficient to produce disconnected components, whereas the results say that for width less than d + 1 the decision regions have to be connected. |
| Researcher Affiliation | Academia | 1Department of Mathematics and Computer Science, Saarland University, Germany 2University of T ubingen, Germany. |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access information for open-source code related to the methodology. |
| Open Datasets | Yes | We use a single image of digit 1 from the MNIST dataset to create a new artificial dataset... In Figure 7, we show another similar experiment on MNIST dataset, but now for all the 10 image classes. |
| Dataset Splits | No | The paper mentions '2000 training images' and discusses training error, but does not specify validation splits or percentages for the datasets used. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running its experiments. |
| Software Dependencies | No | The paper mentions methods like 'leaky Re LU', 'SGD', and 'cross-entropy loss', but does not specify any software packages or libraries with version numbers (e.g., 'PyTorch 1.9', 'TensorFlow 2.0') that were used in their experiments. |
| Experiment Setup | Yes | We then train this network by using SGD with momentum for 1000 epochs and learning rate 0.1 and reduce the it by a factor of 2 after every 50 epochs. |