Width and Depth Limits Commute in Residual Networks

Authors: Soufiane Hayou, Greg Yang

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive simulations that show an excellent match with our theoretical findings.
Researcher Affiliation Collaboration 1Department of Mathematics, National University of Singapore 2Microsoft Research AI.
Pseudocode No The paper describes mathematical derivations and theoretical concepts, but it does not include any pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described in this paper.
Open Datasets No The paper conducts simulations on randomly generated inputs for theoretical validation, rather than using a publicly available or open dataset.
Dataset Splits No The paper conducts simulations to validate theoretical findings, but it does not specify dataset splits for training, validation, or testing, as it does not use a pre-existing dataset.
Hardware Specification No The paper focuses on theoretical analysis and simulations, but it does not specify any hardware details (e.g., GPU/CPU models, memory) used for conducting these experiments.
Software Dependencies No The paper mentions 'PDE solver (RK45 method, Fehlberg, 1968)' as an approximation method for theoretical prediction but does not specify any software libraries or their version numbers used for the simulations.
Experiment Setup Yes To empirically validate this finding, we show in Fig. 2 the histograms of the first neuron in the last layer (t = 1 in Theorem 1) for a randomly chosen input a and n, L {5, 50, 500}. We also perform a Kolmogorov Smirnov normality test and report the statistic (KS) and the p-value. As can be seen in Fig. 2, the histograms appear to fit the theoretical Gaussian distribution more closely as width and depth increase. The histogram is based on N = 10^4 simulations. In Fig. 5, we compare the empirical covariance qˆt with the theoretical prediction qt for (n, L) {5, 50, 500, 5000}. The average is calculated based on N = 100 simulations. The theoretical prediction qt is approximated with a PDE solver (RK45 method, Fehlberg, 1968) for t [0, 1] with a discretization step t =1e-4.