Characterizing Well-Behaved vs. Pathological Deep Neural Networks
Authors: Antoine Labatie
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our results can be fully reproduced with the source code available at https://github.com/alabatie/moments-dnns. Figure 2: Slowly diffusing moments of vanilla nets with L = 200 layers of width Nl = 128. (a) Distribution of log ν2(xl) log ν2(x0) for l = 50, 100, 150, 200. (b) Same for log µ2(dxl) log µ2(dx0). Figure 3: Pathology of one-dimensional signal for vanilla nets with L = 200 layers of width Nl = 512. (a) δχl such that δχl exp m[χl] 1. (b) reff(xl) indicates one-dimensional signal pathology: reff(xl) 1. Figure 4: Pathology of exploding sensitivity for batchnormalized feedforward nets with L = 200 layers of width Nl = 512. (a) Geometric increments δχl decomposed as the product of δBNχl defined as the increment from (xl 1, dxl 1) to (zl, dzl), and δφχl defined as the increment from (zl, dzl) to (xl, dxl). (b) The growth of χl indicates exploding sensitivity pathology: χl exp(γl) for some γ > 0. (c) xl becomes ill-conditioned with small reff(xl). (d) zl becomes fat-tailed distributed with respect to x, α, with large µ4(zl) and small ν1(|zl|). Figure 5: Well-behaved evolution of batch-normalized resnets with L = 500 residual units comprised of H = 2 layers of width N = 512. (a) Geometric feedforward increments δχl,1 decomposed as the product of δBNχl,1 defined as the increment from (yl,0, dyl,0) to (zl,1, dzl,1), and δφχl,1 defined as the increment from (zl,1, dzl,1) to (yl,1, dyl,1). (b) χl has power-law growth. (c) reff(xl,1) indicates that many directions of signal variance are preserved. (d) µ4(zl,1), ν1(|zl,1|) indicate that zl,1 has close to Gaussian data distribution. |
| Researcher Affiliation | Industry | 1Labatie-AI. Correspondence to: Antoine Labatie <antoine@labatie.ai>. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our results can be fully reproduced with the source code available at https://github.com/alabatie/moments-dnns. |
| Open Datasets | No | The paper does not provide concrete access information for a publicly available or open dataset. It describes the input as "random tensorial input x x0 Rn n N0" suggesting synthetic or abstract input. |
| Dataset Splits | No | The paper does not provide specific dataset split information. |
| Hardware Specification | No | The paper does not provide specific hardware details used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers. |
| Experiment Setup | Yes | We consider networks at initialization, which we suppose is standard following He et al. (2015): (i) weights are initialized with ωl N 0, 2 / (Kd l Nl 1) I , biases are initialized with zeros; (ii) when pre-activations are batch-normalized, scale and shift batch normalization parameters are initialized with ones and zeros respectively. Figure 2: Slowly diffusing moments of vanilla nets with L = 200 layers of width Nl = 128. Figure 3: Pathology of one-dimensional signal for vanilla nets with L = 200 layers of width Nl = 512. Figure 4: Pathology of exploding sensitivity for batchnormalized feedforward nets with L = 200 layers of width Nl = 512. Figure 5: Well-behaved evolution of batch-normalized resnets with L = 500 residual units comprised of H = 2 layers of width N = 512. |