reproducibilityindex.ai

Depth-Width Trade-offs for ReLU Networks via Sharkovsky's Theorem

Authors: Vaggos Chatziafratis, Sai Ganesh Nagarajan, Ioannis Panageas, Xiao Wang

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we provide experimental evidence for our depth separation results by training a neural network of constant width, but with increasing depth on a classiﬁcation task that closely resembles the n-alternating points problem that appeared in Telgarsky (2015) and is the foundation of our separation results as well. Our goal is to create a diagram showing how the classiﬁcation error drops as a function of the depth of the network for a ﬁxed value of the width.
Researcher Affiliation	Academia	Vaggos Chatziafratis Department of Computer Science Stanford University Sai Ganesh Nagarajan & Ioannis Panageas & Xiao Wang Singapore University of Technology and Design
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code for the methodology described, nor does it provide any links to a code repository.
Open Datasets	No	The paper describes the creation of a custom dataset: 'We create 8000 equally spaced points from [0,1] (in increasing order), where the ﬁrst 1000 points are of label 0, the second 1000 are label 1 and this label alternates every 1000 points.' However, no concrete access information (link, DOI, repository, or formal citation for public access) is provided for this dataset.
Dataset Splits	No	The paper mentions creating '8000 equally spaced points' and discussing 'training error', but it does not specify explicit training, validation, or test dataset splits (e.g., percentages, sample counts, or references to predefined splits).
Hardware Specification	No	The paper describes its experimental procedure including varying network depth and using the ADAM optimizer, but it does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions using 'Re LU s', 'sigmoid' activation, and the 'ADAM optimizer Kingma & Ba (2014)', but it does not provide specific version numbers for any software dependencies (e.g., programming language, libraries, or frameworks) used in the experiments.
Experiment Setup	Yes	To perform the experiments, we vary the depth of the neural network (excluding the input and the output layer) as d = 1, 2, 3, 4, 5. In addition, we ﬁx the neurons for each layer to be 6. All activations are Re LU s, while the last layer is the classiﬁer that uses a sigmoid to output probabilities. Each model adds one extra hidden layer and we make use of the same hyper-parameters to train all networks. Moreover, we require the training error or the classiﬁcation error to tend to 0 during the training procedure, i.e, we will try and overﬁt the data (as we try to demonstrate a representation result, rather than a statistical/generalization result). Thus, for the actual training we use the same parameters to train all the different models using the ADAM optimizer Kingma & Ba (2014) and make the epochs to be 200 in order to enable overﬁtting.