Benign Overfitting in Two-layer Convolutional Neural Networks
Authors: Yuan Cao, Zixiang Chen, Misha Belkin, Quanquan Gu
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we study the benign overfitting phenomenon in training a two-layer convolutional neural network (CNN). We show that when the signal-to-noise ratio satisfies a certain condition, a two-layer CNN trained by gradient descent can achieve arbitrarily small training and test loss. On the other hand, when this condition does not hold, overfitting becomes harmful and the obtained CNN can only achieve constant level test loss. These together demonstrate a sharp phase transition between benign overfitting and harmful overfitting, driven by the signal-to-noise ratio. To the best of our knowledge, this is the first work that precisely characterizes the conditions under which benign overfitting can occur in training convolutional neural networks. |
| Researcher Affiliation | Academia | Yuan Cao Department of Statistics & Actuarial Science Department of Mathematics The University of Hong Kong yuancao@hku.hk Zixiang Chen Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095, USA chenzx19@cs.ucla.edu Mikhail Belkin Haliciolu Data Science Institute University of California San Diego La Jolla, CA 92093, USA mbelkin@ucsd.edu Quanquan Gu Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095, USA qgu@cs.ucla.edu |
| Pseudocode | No | The paper describes mathematical derivations and theoretical concepts, but it does not include any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not mention providing open-source code for the described methodology. |
| Open Datasets | No | The paper defines a theoretical 'data distribution D' for its model and cites existing datasets in its related work/references, but it does not specify or provide access information for a real-world dataset used for training or empirical validation within the scope of this paper's main findings. |
| Dataset Splits | No | The paper is theoretical and does not describe experimental validation or dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not mention any specific hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not specify any software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and focuses on mathematical proofs and conditions rather than describing an experimental setup with hyperparameters or training details. |