reproducibilityindex.ai

Understanding Deep Neural Networks with Rectified Linear Units

Authors: Raman Arora, Amitabh Basu, Poorya Mianjy, Anirbit Mukherjee

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this paper we investigate the family of functions representable by deep neural networks (DNN) with rectiﬁed linear units (Re LU). We give an algorithm to train a Re LU DNN with one hidden layer to global optimality with runtime polynomial in the data size albeit exponential in the input dimension. Further, we improve on the known lower bounds on size (from exponential to super exponential) for approximating a Re LU deep net function by a shallower Re LU net. Our gap theorems hold for smoothly parametrized families of hard functions, contrary to countable, discrete families known in the literature.
Researcher Affiliation	Academia	Raman Arora Amitabh Basu Poorya Mianjy Anirbit Mukherjee Johns Hopkins University Department of Computer Science, Email: arora@cs.jhu.edu Department of Applied Mathematics and Statistics, Email: basu.amitabh@jhu.edu Department of Computer Science, Email: mianjy@jhu.edu Department of Applied Mathematics and Statistics, Email: amukhe14@jhu.edu
Pseudocode	Yes	Algorithm 1 Empirical Risk Minimization 1: function ERM(D) Where D = {(xi, yi)}D i=1 Rn R 2: S = {+1, 1}w All possible instantiations of top layer weights 3: Pi = {(P i +, P i )}, i = 1, . . . , w All possible partitions of data into two parts 4: P = P1 P2 Pw 5: count = 1 Counter 6: for s S do 7: for {(P i +, P i )}w i=1 P do 8: loss(count) = minimize: a, b ℓ(si( ai xj + bi), yj) subject to: ai xj + bi 0 j P i ai xj + bi 0 j P i + 9: count + + 10: end for 11: OPT = argmin loss(count) 12: end for 13: return { a}, { b}, s corresponding to OPT s iterate 14: end function
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described.
Open Datasets	No	The paper does not provide concrete access information for a publicly available or open dataset. It discusses theoretical data points for an optimization problem (Given D data points (xi, yi) Rn R, i = 1, . . . , D, ﬁnd the function f represented by 2-layer Rn R Re LU DNNs of width w, that minimizes the following optimization problem).
Dataset Splits	No	The paper does not provide specific dataset split information for validation as it is a theoretical paper and does not describe empirical experiments.
Hardware Specification	No	The paper does not provide specific hardware details used for running its experiments, as it is a theoretical paper.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers, as it is a theoretical paper.
Experiment Setup	No	The paper does not contain specific experimental setup details (hyperparameters, training configurations, or system-level settings) as it is a theoretical paper.