reproducibilityindex.ai

Global Convergence of Three-layer Neural Networks in the Mean Field Regime

Authors: Huy Tuan Pham, Phan-Minh Nguyen

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this work, we prove a global convergence result for unregularized feedforward three-layer networks in the mean ﬁeld regime. We ﬁrst develop a rigorous framework to establish the mean ﬁeld limit of three-layer networks under stochastic gradient descent training. To that end, we propose the idea of a neuronal embedding, which comprises of a ﬁxed probability space that encapsulates neural networks of arbitrary sizes. The identiﬁed mean ﬁeld limit is then used to prove a global convergence guarantee under suitable regularity and convergence mode assumptions, which unlike previous works on two-layer networks does not rely critically on convexity. Underlying the result is a universal approximation property, natural of neural networks, which importantly is shown to hold at any ﬁnite training time (not necessarily at convergence) via an algebraic topology argument. Complete proofs are presented in appendices.
Researcher Affiliation	Collaboration	Department of Mathematics, Stanford University. This work was done in parts while H. T. Pham was at the University of Cambridge. The Voleon Group. This work was done while P.-M. Nguyen was at Stanford University.
Pseudocode	No	The paper presents mathematical equations and theoretical derivations, but it does not include any pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any statement about making its source code open or providing links to code repositories.
Open Datasets	No	The paper mentions drawing samples 'from a training distribution P' but does not refer to or provide access information for a specific, publicly available dataset.
Dataset Splits	No	The paper is theoretical and does not specify any training, validation, or test dataset splits for experimental reproduction.
Hardware Specification	No	The paper is theoretical and does not describe any specific hardware (e.g., GPU/CPU models, processors, memory details) used for computational experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers.
Experiment Setup	No	The paper describes the theoretical setup for SGD training, including general parameters like 'learning rate ϵ', but does not provide concrete numerical values for these or other typical experimental hyperparameters (e.g., batch size, specific optimizer settings) that would define an experimental setup.