reproducibilityindex.ai

Characterizing Well-Behaved vs. Pathological Deep Neural Networks

Authors: Antoine Labatie

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our results can be fully reproduced with the source code available at https://github.com/alabatie/moments-dnns. Figure 2: Slowly diffusing moments of vanilla nets with L = 200 layers of width Nl = 128. (a) Distribution of log ν2(xl) log ν2(x0) for l = 50, 100, 150, 200. (b) Same for log µ2(dxl) log µ2(dx0). Figure 3: Pathology of one-dimensional signal for vanilla nets with L = 200 layers of width Nl = 512. (a) δχl such that δχl exp m[χl] 1. (b) reff(xl) indicates one-dimensional signal pathology: reff(xl) 1. Figure 4: Pathology of exploding sensitivity for batchnormalized feedforward nets with L = 200 layers of width Nl = 512. (a) Geometric increments δχl decomposed as the product of δBNχl deﬁned as the increment from (xl 1, dxl 1) to (zl, dzl), and δφχl deﬁned as the increment from (zl, dzl) to (xl, dxl). (b) The growth of χl indicates exploding sensitivity pathology: χl exp(γl) for some γ > 0. (c) xl becomes ill-conditioned with small reff(xl). (d) zl becomes fat-tailed distributed with respect to x, α, with large µ4(zl) and small ν1(\|zl\|). Figure 5: Well-behaved evolution of batch-normalized resnets with L = 500 residual units comprised of H = 2 layers of width N = 512. (a) Geometric feedforward increments δχl,1 decomposed as the product of δBNχl,1 deﬁned as the increment from (yl,0, dyl,0) to (zl,1, dzl,1), and δφχl,1 deﬁned as the increment from (zl,1, dzl,1) to (yl,1, dyl,1). (b) χl has power-law growth. (c) reff(xl,1) indicates that many directions of signal variance are preserved. (d) µ4(zl,1), ν1(\|zl,1\|) indicate that zl,1 has close to Gaussian data distribution.
Researcher Affiliation	Industry	1Labatie-AI. Correspondence to: Antoine Labatie <antoine@labatie.ai>.
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our results can be fully reproduced with the source code available at https://github.com/alabatie/moments-dnns.
Open Datasets	No	The paper does not provide concrete access information for a publicly available or open dataset. It describes the input as "random tensorial input x x0 Rn n N0" suggesting synthetic or abstract input.
Dataset Splits	No	The paper does not provide specific dataset split information.
Hardware Specification	No	The paper does not provide specific hardware details used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers.
Experiment Setup	Yes	We consider networks at initialization, which we suppose is standard following He et al. (2015): (i) weights are initialized with ωl N 0, 2 / (Kd l Nl 1) I , biases are initialized with zeros; (ii) when pre-activations are batch-normalized, scale and shift batch normalization parameters are initialized with ones and zeros respectively. Figure 2: Slowly diffusing moments of vanilla nets with L = 200 layers of width Nl = 128. Figure 3: Pathology of one-dimensional signal for vanilla nets with L = 200 layers of width Nl = 512. Figure 4: Pathology of exploding sensitivity for batchnormalized feedforward nets with L = 200 layers of width Nl = 512. Figure 5: Well-behaved evolution of batch-normalized resnets with L = 500 residual units comprised of H = 2 layers of width N = 512.