Characterizing Well-Behaved vs. Pathological Deep Neural Networks

Authors: Antoine Labatie

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our results can be fully reproduced with the source code available at https://github.com/alabatie/moments-dnns. Figure 2: Slowly diffusing moments of vanilla nets with L = 200 layers of width Nl = 128. (a) Distribution of log ν2(xl) log ν2(x0) for l = 50, 100, 150, 200. (b) Same for log µ2(dxl) log µ2(dx0). Figure 3: Pathology of one-dimensional signal for vanilla nets with L = 200 layers of width Nl = 512. (a) δχl such that δχl exp m[χl] 1. (b) reff(xl) indicates one-dimensional signal pathology: reff(xl) 1. Figure 4: Pathology of exploding sensitivity for batchnormalized feedforward nets with L = 200 layers of width Nl = 512. (a) Geometric increments δχl decomposed as the product of δBNχl defined as the increment from (xl 1, dxl 1) to (zl, dzl), and δφχl defined as the increment from (zl, dzl) to (xl, dxl). (b) The growth of χl indicates exploding sensitivity pathology: χl exp(γl) for some γ > 0. (c) xl becomes ill-conditioned with small reff(xl). (d) zl becomes fat-tailed distributed with respect to x, α, with large µ4(zl) and small ν1(|zl|). Figure 5: Well-behaved evolution of batch-normalized resnets with L = 500 residual units comprised of H = 2 layers of width N = 512. (a) Geometric feedforward increments δχl,1 decomposed as the product of δBNχl,1 defined as the increment from (yl,0, dyl,0) to (zl,1, dzl,1), and δφχl,1 defined as the increment from (zl,1, dzl,1) to (yl,1, dyl,1). (b) χl has power-law growth. (c) reff(xl,1) indicates that many directions of signal variance are preserved. (d) µ4(zl,1), ν1(|zl,1|) indicate that zl,1 has close to Gaussian data distribution.
Researcher Affiliation Industry 1Labatie-AI. Correspondence to: Antoine Labatie <antoine@labatie.ai>.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Our results can be fully reproduced with the source code available at https://github.com/alabatie/moments-dnns.
Open Datasets No The paper does not provide concrete access information for a publicly available or open dataset. It describes the input as "random tensorial input x x0 Rn n N0" suggesting synthetic or abstract input.
Dataset Splits No The paper does not provide specific dataset split information.
Hardware Specification No The paper does not provide specific hardware details used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers.
Experiment Setup Yes We consider networks at initialization, which we suppose is standard following He et al. (2015): (i) weights are initialized with ωl N 0, 2 / (Kd l Nl 1) I , biases are initialized with zeros; (ii) when pre-activations are batch-normalized, scale and shift batch normalization parameters are initialized with ones and zeros respectively. Figure 2: Slowly diffusing moments of vanilla nets with L = 200 layers of width Nl = 128. Figure 3: Pathology of one-dimensional signal for vanilla nets with L = 200 layers of width Nl = 512. Figure 4: Pathology of exploding sensitivity for batchnormalized feedforward nets with L = 200 layers of width Nl = 512. Figure 5: Well-behaved evolution of batch-normalized resnets with L = 500 residual units comprised of H = 2 layers of width N = 512.