reproducibilityindex.ai

Benign Overfitting in Deep Neural Networks under Lazy Training

Authors: Zhenyu Zhu, Fanghui Liu, Grigorios Chrysos, Francesco Locatello, Volkan Cevher

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	This paper focuses on over-parameterized deep neural networks (DNNs) with Re LU activation functions and proves that when the data distribution is well-separated, DNNs can achieve Bayesoptimal test error for classification while obtaining (nearly) zero-training error under the lazy training regime. For this purpose, we unify three interrelated concepts of overparameterization, benign overfitting, and the Lipschitz constant of DNNs. Our results indicate that interpolating with smoother functions leads to better generalization. Furthermore, we investigate the special case where interpolating smooth ground-truth functions is performed by DNNs under the Neural Tangent Kernel (NTK) regime for generalization. Our result demonstrates that the generalization error converges to a constant order that only depends on label noise and initialization noise, which theoretically verifies benign overfitting. Our analysis provides a tight lower bound on the normalized margin under non-smooth activation functions, as well as the minimum eigenvalue of NTK under high-dimensional settings, which has its own interest in learning theory.
Researcher Affiliation	Collaboration	1Laboratory for Information and Inference Systems, Ecole Polytechnique F ed erale de Lausanne (EPFL), Switzerland 2Amazon Web Services (Work done outside of Amazon).
Pseudocode	Yes	Algorithm 1 SGD for training DNNs
Open Source Code	No	The paper does not provide any explicit statements about releasing source code or links to a code repository for the methodology described.
Open Datasets	Yes	We also empirically verify our assumption on MNIST (Lecun et al., 1998) with ten digits from 0 to 9.
Dataset Splits	No	The paper discusses 'training data' and 'test error' but does not provide specific details on how the dataset was split into training, validation, and test sets (e.g., percentages, sample counts, or specific predefined splits with citations).
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running experiments or computations are mentioned in the paper.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., 'Python 3.8', 'PyTorch 1.9') are mentioned in the paper.
Experiment Setup	Yes	Given a DNN defined by Eq. (1) and trained by Algorithm 1 with a step size α L 2(log m) 5/2. Then under Assumption 1 and 2, for ω O(L 9/2(log m) 3) and λ > 0, with probability at least 1 O(n L2) exp( Ω(mω2/3L))... We use NTK initialization (Allen-Zhu et al., 2019b) in this section, but the main result can be easily extended to more initializations, such as He (He et al., 2015) and Le Cun (Le Cun et al., 2012).