Benign Overfitting in Two-layer Convolutional Neural Networks

Authors: Yuan Cao, Zixiang Chen, Misha Belkin, Quanquan Gu

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this paper, we study the benign overfitting phenomenon in training a two-layer convolutional neural network (CNN). We show that when the signal-to-noise ratio satisfies a certain condition, a two-layer CNN trained by gradient descent can achieve arbitrarily small training and test loss. On the other hand, when this condition does not hold, overfitting becomes harmful and the obtained CNN can only achieve constant level test loss. These together demonstrate a sharp phase transition between benign overfitting and harmful overfitting, driven by the signal-to-noise ratio. To the best of our knowledge, this is the first work that precisely characterizes the conditions under which benign overfitting can occur in training convolutional neural networks.
Researcher Affiliation Academia Yuan Cao Department of Statistics & Actuarial Science Department of Mathematics The University of Hong Kong yuancao@hku.hk Zixiang Chen Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095, USA chenzx19@cs.ucla.edu Mikhail Belkin Haliciolu Data Science Institute University of California San Diego La Jolla, CA 92093, USA mbelkin@ucsd.edu Quanquan Gu Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095, USA qgu@cs.ucla.edu
Pseudocode No The paper describes mathematical derivations and theoretical concepts, but it does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not mention providing open-source code for the described methodology.
Open Datasets No The paper defines a theoretical 'data distribution D' for its model and cites existing datasets in its related work/references, but it does not specify or provide access information for a real-world dataset used for training or empirical validation within the scope of this paper's main findings.
Dataset Splits No The paper is theoretical and does not describe experimental validation or dataset splits for training, validation, or testing.
Hardware Specification No The paper is theoretical and does not mention any specific hardware used for experiments.
Software Dependencies No The paper is theoretical and does not specify any software dependencies with version numbers.
Experiment Setup No The paper is theoretical and focuses on mathematical proofs and conditions rather than describing an experimental setup with hyperparameters or training details.