Understanding Deep Neural Networks with Rectified Linear Units

Authors: Raman Arora, Amitabh Basu, Poorya Mianjy, Anirbit Mukherjee

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this paper we investigate the family of functions representable by deep neural networks (DNN) with rectified linear units (Re LU). We give an algorithm to train a Re LU DNN with one hidden layer to global optimality with runtime polynomial in the data size albeit exponential in the input dimension. Further, we improve on the known lower bounds on size (from exponential to super exponential) for approximating a Re LU deep net function by a shallower Re LU net. Our gap theorems hold for smoothly parametrized families of hard functions, contrary to countable, discrete families known in the literature.
Researcher Affiliation Academia Raman Arora Amitabh Basu Poorya Mianjy Anirbit Mukherjee Johns Hopkins University Department of Computer Science, Email: arora@cs.jhu.edu Department of Applied Mathematics and Statistics, Email: basu.amitabh@jhu.edu Department of Computer Science, Email: mianjy@jhu.edu Department of Applied Mathematics and Statistics, Email: amukhe14@jhu.edu
Pseudocode Yes Algorithm 1 Empirical Risk Minimization 1: function ERM(D) Where D = {(xi, yi)}D i=1 Rn R 2: S = {+1, 1}w All possible instantiations of top layer weights 3: Pi = {(P i +, P i )}, i = 1, . . . , w All possible partitions of data into two parts 4: P = P1 P2 Pw 5: count = 1 Counter 6: for s S do 7: for {(P i +, P i )}w i=1 P do 8: loss(count) = minimize: a, b ℓ(si( ai xj + bi), yj) subject to: ai xj + bi 0 j P i ai xj + bi 0 j P i + 9: count + + 10: end for 11: OPT = argmin loss(count) 12: end for 13: return { a}, { b}, s corresponding to OPT s iterate 14: end function
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets No The paper does not provide concrete access information for a publicly available or open dataset. It discusses theoretical data points for an optimization problem (Given D data points (xi, yi) Rn R, i = 1, . . . , D, find the function f represented by 2-layer Rn R Re LU DNNs of width w, that minimizes the following optimization problem).
Dataset Splits No The paper does not provide specific dataset split information for validation as it is a theoretical paper and does not describe empirical experiments.
Hardware Specification No The paper does not provide specific hardware details used for running its experiments, as it is a theoretical paper.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers, as it is a theoretical paper.
Experiment Setup No The paper does not contain specific experimental setup details (hyperparameters, training configurations, or system-level settings) as it is a theoretical paper.