Depth-Width Trade-offs for ReLU Networks via Sharkovsky's Theorem
Authors: Vaggos Chatziafratis, Sai Ganesh Nagarajan, Ioannis Panageas, Xiao Wang
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we provide experimental evidence for our depth separation results by training a neural network of constant width, but with increasing depth on a classification task that closely resembles the n-alternating points problem that appeared in Telgarsky (2015) and is the foundation of our separation results as well. Our goal is to create a diagram showing how the classification error drops as a function of the depth of the network for a fixed value of the width. |
| Researcher Affiliation | Academia | Vaggos Chatziafratis Department of Computer Science Stanford University Sai Ganesh Nagarajan & Ioannis Panageas & Xiao Wang Singapore University of Technology and Design |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code for the methodology described, nor does it provide any links to a code repository. |
| Open Datasets | No | The paper describes the creation of a custom dataset: 'We create 8000 equally spaced points from [0,1] (in increasing order), where the first 1000 points are of label 0, the second 1000 are label 1 and this label alternates every 1000 points.' However, no concrete access information (link, DOI, repository, or formal citation for public access) is provided for this dataset. |
| Dataset Splits | No | The paper mentions creating '8000 equally spaced points' and discussing 'training error', but it does not specify explicit training, validation, or test dataset splits (e.g., percentages, sample counts, or references to predefined splits). |
| Hardware Specification | No | The paper describes its experimental procedure including varying network depth and using the ADAM optimizer, but it does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'Re LU s', 'sigmoid' activation, and the 'ADAM optimizer Kingma & Ba (2014)', but it does not provide specific version numbers for any software dependencies (e.g., programming language, libraries, or frameworks) used in the experiments. |
| Experiment Setup | Yes | To perform the experiments, we vary the depth of the neural network (excluding the input and the output layer) as d = 1, 2, 3, 4, 5. In addition, we fix the neurons for each layer to be 6. All activations are Re LU s, while the last layer is the classifier that uses a sigmoid to output probabilities. Each model adds one extra hidden layer and we make use of the same hyper-parameters to train all networks. Moreover, we require the training error or the classification error to tend to 0 during the training procedure, i.e, we will try and overfit the data (as we try to demonstrate a representation result, rather than a statistical/generalization result). Thus, for the actual training we use the same parameters to train all the different models using the ADAM optimizer Kingma & Ba (2014) and make the epochs to be 200 in order to enable overfitting. |