reproducibilityindex.ai

Which Neural Net Architectures Give Rise to Exploding and Vanishing Gradients?

Authors: Boris Hanin

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We give a rigorous analysis of the statistical behavior of gradients in a randomly initialized fully connected network N with ReLU activations. Our results show that the empirical variance of the squares of the entries in the input-output Jacobian of N is exponential in a simple architecture-dependent constant β, given by the sum of the reciprocals of the hidden layer widths. ... From this point of view, we rigorously compute finite width corrections to the statistics of gradients at the edge of chaos. The main contributions of this work are: 1. We derive new exact formulas for the joint even moments... 2. We prove that the empirical variance of gradients... 3. We prove that, so long as weights and biases...
Researcher Affiliation	Academia	Boris Hanin Department of Mathematics Texas A& M University College Station, TX, USA bhanin@math.tamu.edu
Pseudocode	No	No pseudocode or algorithm blocks are present in the paper.
Open Source Code	No	The paper does not provide an explicit statement or link for open-source code related to its described methodology.
Open Datasets	Yes	Figure 1: Comparison of early training dynamics on vectorized MNIST for fully connected ReLU nets with various architectures. ... (Figure reprinted with permission from [HR18] with caption modified).
Dataset Splits	No	The paper does not provide specific details on training, validation, or test dataset splits.
Hardware Specification	No	The paper does not specify any hardware details used for running experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers.
Experiment Setup	No	The paper describes network characteristics (ReLU activations, random weight/bias initialization) but does not provide specific hyperparameters like learning rate, batch size, or optimizer settings for training.