Understanding Deep Neural Networks with Rectified Linear Units
Authors: Raman Arora, Amitabh Basu, Poorya Mianjy, Anirbit Mukherjee
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper we investigate the family of functions representable by deep neural networks (DNN) with rectified linear units (Re LU). We give an algorithm to train a Re LU DNN with one hidden layer to global optimality with runtime polynomial in the data size albeit exponential in the input dimension. Further, we improve on the known lower bounds on size (from exponential to super exponential) for approximating a Re LU deep net function by a shallower Re LU net. Our gap theorems hold for smoothly parametrized families of hard functions, contrary to countable, discrete families known in the literature. |
| Researcher Affiliation | Academia | Raman Arora Amitabh Basu Poorya Mianjy Anirbit Mukherjee Johns Hopkins University Department of Computer Science, Email: arora@cs.jhu.edu Department of Applied Mathematics and Statistics, Email: basu.amitabh@jhu.edu Department of Computer Science, Email: mianjy@jhu.edu Department of Applied Mathematics and Statistics, Email: amukhe14@jhu.edu |
| Pseudocode | Yes | Algorithm 1 Empirical Risk Minimization 1: function ERM(D) Where D = {(xi, yi)}D i=1 Rn R 2: S = {+1, 1}w All possible instantiations of top layer weights 3: Pi = {(P i +, P i )}, i = 1, . . . , w All possible partitions of data into two parts 4: P = P1 P2 Pw 5: count = 1 Counter 6: for s S do 7: for {(P i +, P i )}w i=1 P do 8: loss(count) = minimize: a, b ℓ(si( ai xj + bi), yj) subject to: ai xj + bi 0 j P i ai xj + bi 0 j P i + 9: count + + 10: end for 11: OPT = argmin loss(count) 12: end for 13: return { a}, { b}, s corresponding to OPT s iterate 14: end function |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | No | The paper does not provide concrete access information for a publicly available or open dataset. It discusses theoretical data points for an optimization problem (Given D data points (xi, yi) Rn R, i = 1, . . . , D, find the function f represented by 2-layer Rn R Re LU DNNs of width w, that minimizes the following optimization problem). |
| Dataset Splits | No | The paper does not provide specific dataset split information for validation as it is a theoretical paper and does not describe empirical experiments. |
| Hardware Specification | No | The paper does not provide specific hardware details used for running its experiments, as it is a theoretical paper. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers, as it is a theoretical paper. |
| Experiment Setup | No | The paper does not contain specific experimental setup details (hyperparameters, training configurations, or system-level settings) as it is a theoretical paper. |