Which Neural Net Architectures Give Rise to Exploding and Vanishing Gradients?

Authors: Boris Hanin

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We give a rigorous analysis of the statistical behavior of gradients in a randomly initialized fully connected network N with ReLU activations. Our results show that the empirical variance of the squares of the entries in the input-output Jacobian of N is exponential in a simple architecture-dependent constant β, given by the sum of the reciprocals of the hidden layer widths. ... From this point of view, we rigorously compute finite width corrections to the statistics of gradients at the edge of chaos. The main contributions of this work are: 1. We derive new exact formulas for the joint even moments... 2. We prove that the empirical variance of gradients... 3. We prove that, so long as weights and biases...
Researcher Affiliation Academia Boris Hanin Department of Mathematics Texas A& M University College Station, TX, USA bhanin@math.tamu.edu
Pseudocode No No pseudocode or algorithm blocks are present in the paper.
Open Source Code No The paper does not provide an explicit statement or link for open-source code related to its described methodology.
Open Datasets Yes Figure 1: Comparison of early training dynamics on vectorized MNIST for fully connected ReLU nets with various architectures. ... (Figure reprinted with permission from [HR18] with caption modified).
Dataset Splits No The paper does not provide specific details on training, validation, or test dataset splits.
Hardware Specification No The paper does not specify any hardware details used for running experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup No The paper describes network characteristics (ReLU activations, random weight/bias initialization) but does not provide specific hyperparameters like learning rate, batch size, or optimizer settings for training.