The future is log-Gaussian: ResNets and their infinite-depth-and-width limit at initialization

Authors: Mufan Li, Mihai Nica, Dan Roy

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using Monte Carlo simulations, we demonstrate that even basic properties of standard Res Net architectures are poorly captured by the Gaussian limit, but remarkably well captured by our log-Gaussian limit. To provide a better approximation, we study Re LU Res Nets in the infinite-depth-and-width limit, where both depth and width tend to infinity as their ratio, d/n, remains constant. In contrast to the Gaussian infinite-width limit, we show theoretically that the network exhibits log-Gaussian behaviour at initialization in the infinite-depth-and-width limit, with parameters depending on the ratio d/n. Using Monte Carlo simulations, we demonstrate that even basic properties of standard Res Net architectures are poorly captured by the Gaussian limit, but remarkably well captured by our log-Gaussian limit. Based on Monte Carlo simulations, we find excellent agreement between our predictions and finite networks (see Figure 1).
Researcher Affiliation Academia Mufan (Bill) Li University of Toronto, Vector Institute Mihai Nica University of Guelph, Vector Institute Daniel M. Roy University of Toronto, Vector Institute Correspondence: mufan.li@mail.utoronto.ca; nicam@uoguelph.ca; daniel.roy@utoronto.ca.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks. It uses mathematical equations and descriptions of network architectures.
Open Source Code No The paper does not provide any explicit statement or link indicating that the source code for the methodology described is publicly available.
Open Datasets No The paper conducts Monte Carlo simulations to verify theoretical predictions about neural networks at initialization, rather than training models on traditional datasets. It does not mention using or providing access to any publicly available dataset for training purposes.
Dataset Splits No The paper focuses on theoretical limits and Monte Carlo simulations of network properties at initialization, not on training deep learning models on datasets. Therefore, there is no mention of dataset splits for training, validation, or testing.
Hardware Specification No The paper does not specify the hardware used for its Monte Carlo simulations (e.g., specific GPU/CPU models, memory details, or cloud resources).
Software Dependencies No The paper mentions various software tools in its references (JAX, PyTorch, NumPy, etc.) some with version numbers in their bibliographic entries. However, it does not explicitly state which of these were used as dependencies for their experiments with specific version numbers in the main text or a dedicated section describing the experimental setup's software environment.
Experiment Setup No The paper primarily analyzes the behavior of neural networks at initialization and through Monte Carlo simulations, rather than training deep learning models. While it defines network parameters (e.g., "All networks have n = 100, nin = nout = 10, α = λ = 1/2"), it does not provide specific experimental setup details such as training hyperparameters (learning rates, batch sizes, epochs), optimizers, or other system-level training configurations typically found in papers describing trained models.