Width and Depth Limits Commute in Residual Networks
Authors: Soufiane Hayou, Greg Yang
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive simulations that show an excellent match with our theoretical findings. |
| Researcher Affiliation | Collaboration | 1Department of Mathematics, National University of Singapore 2Microsoft Research AI. |
| Pseudocode | No | The paper describes mathematical derivations and theoretical concepts, but it does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described in this paper. |
| Open Datasets | No | The paper conducts simulations on randomly generated inputs for theoretical validation, rather than using a publicly available or open dataset. |
| Dataset Splits | No | The paper conducts simulations to validate theoretical findings, but it does not specify dataset splits for training, validation, or testing, as it does not use a pre-existing dataset. |
| Hardware Specification | No | The paper focuses on theoretical analysis and simulations, but it does not specify any hardware details (e.g., GPU/CPU models, memory) used for conducting these experiments. |
| Software Dependencies | No | The paper mentions 'PDE solver (RK45 method, Fehlberg, 1968)' as an approximation method for theoretical prediction but does not specify any software libraries or their version numbers used for the simulations. |
| Experiment Setup | Yes | To empirically validate this finding, we show in Fig. 2 the histograms of the first neuron in the last layer (t = 1 in Theorem 1) for a randomly chosen input a and n, L {5, 50, 500}. We also perform a Kolmogorov Smirnov normality test and report the statistic (KS) and the p-value. As can be seen in Fig. 2, the histograms appear to fit the theoretical Gaussian distribution more closely as width and depth increase. The histogram is based on N = 10^4 simulations. In Fig. 5, we compare the empirical covariance qˆt with the theoretical prediction qt for (n, L) {5, 50, 500, 5000}. The average is calculated based on N = 100 simulations. The theoretical prediction qt is approximated with a PDE solver (RK45 method, Fehlberg, 1968) for t [0, 1] with a discretization step t =1e-4. |