Global Convergence of Three-layer Neural Networks in the Mean Field Regime

Authors: Huy Tuan Pham, Phan-Minh Nguyen

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this work, we prove a global convergence result for unregularized feedforward three-layer networks in the mean field regime. We first develop a rigorous framework to establish the mean field limit of three-layer networks under stochastic gradient descent training. To that end, we propose the idea of a neuronal embedding, which comprises of a fixed probability space that encapsulates neural networks of arbitrary sizes. The identified mean field limit is then used to prove a global convergence guarantee under suitable regularity and convergence mode assumptions, which unlike previous works on two-layer networks does not rely critically on convexity. Underlying the result is a universal approximation property, natural of neural networks, which importantly is shown to hold at any finite training time (not necessarily at convergence) via an algebraic topology argument. Complete proofs are presented in appendices.
Researcher Affiliation Collaboration Department of Mathematics, Stanford University. This work was done in parts while H. T. Pham was at the University of Cambridge. The Voleon Group. This work was done while P.-M. Nguyen was at Stanford University.
Pseudocode No The paper presents mathematical equations and theoretical derivations, but it does not include any pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any statement about making its source code open or providing links to code repositories.
Open Datasets No The paper mentions drawing samples 'from a training distribution P' but does not refer to or provide access information for a specific, publicly available dataset.
Dataset Splits No The paper is theoretical and does not specify any training, validation, or test dataset splits for experimental reproduction.
Hardware Specification No The paper is theoretical and does not describe any specific hardware (e.g., GPU/CPU models, processors, memory details) used for computational experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers.
Experiment Setup No The paper describes the theoretical setup for SGD training, including general parameters like 'learning rate ϵ', but does not provide concrete numerical values for these or other typical experimental hyperparameters (e.g., batch size, specific optimizer settings) that would define an experimental setup.