Deep ReLU Networks Preserve Expected Length
Authors: Boris Hanin, Ryan Jeong, David Rolnick
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | These theoretical results are corroborated by our experiments. and We empirically verify that our theoretical results accurately predict observed behavior for networks at initialization, while previous bounds are loose and fail to capture subtle architecture dependencies. |
| Researcher Affiliation | Academia | Boris Hanin Dept. of Operations Research & Financial Engineering Princeton University Princeton, NJ 08544 USA bhanin@princeton.edu Ryan Jeong Dept. of Mathematics University of Pennsylvania Philadelphia, PA 19104 USA rsjeong@sas.upenn.edu David Rolnick School of Computer Science Mc Gill University Montr eal, QC H3A 0G4 Canada drolnick@cs.mcgill.ca |
| Pseudocode | No | Explanation: The paper provides detailed mathematical proofs and explanations of its theory but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | Explanation: The paper does not contain any statements about releasing code, nor does it provide a link to a code repository for the methodology described. |
| Open Datasets | No | Explanation: The paper describes the generation of input 'line segments' for experiments but does not use a publicly available or open dataset, nor does it provide concrete access information for any dataset used. |
| Dataset Splits | No | Explanation: The paper does not specify explicit train/validation/test dataset splits. Experiments involve randomly initialized networks and different initializations. |
| Hardware Specification | No | Explanation: The paper does not provide any specific hardware details such as GPU or CPU models, memory specifications, or cloud computing instance types used for running the experiments. |
| Software Dependencies | No | Explanation: The paper does not list specific software dependencies with version numbers (e.g., library names with versions) that would be needed to replicate the experiments. |
| Experiment Setup | Yes | For all experiments, weights were initialized from i.i.d. normal distributions with variance 2/fan-in and bias variance 0.1. and Length distortion is calculated for 500 different initializations of the weights and biases of the network (the weight variance is 2/fan-in). and We first approximate the intersections of the line segment with linear region boundaries using a binary search subroutine. Specifically, initialize the set S = [0, 0.5, 1], which will contain the parameter values for these approximations. |