reproducibilityindex.ai

Benign Overfitting in Two-layer Convolutional Neural Networks

Authors: Yuan Cao, Zixiang Chen, Misha Belkin, Quanquan Gu

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this paper, we study the benign overﬁtting phenomenon in training a two-layer convolutional neural network (CNN). We show that when the signal-to-noise ratio satisﬁes a certain condition, a two-layer CNN trained by gradient descent can achieve arbitrarily small training and test loss. On the other hand, when this condition does not hold, overﬁtting becomes harmful and the obtained CNN can only achieve constant level test loss. These together demonstrate a sharp phase transition between benign overﬁtting and harmful overﬁtting, driven by the signal-to-noise ratio. To the best of our knowledge, this is the ﬁrst work that precisely characterizes the conditions under which benign overﬁtting can occur in training convolutional neural networks.
Researcher Affiliation	Academia	Yuan Cao Department of Statistics & Actuarial Science Department of Mathematics The University of Hong Kong yuancao@hku.hk Zixiang Chen Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095, USA chenzx19@cs.ucla.edu Mikhail Belkin Haliciolu Data Science Institute University of California San Diego La Jolla, CA 92093, USA mbelkin@ucsd.edu Quanquan Gu Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095, USA qgu@cs.ucla.edu
Pseudocode	No	The paper describes mathematical derivations and theoretical concepts, but it does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper does not mention providing open-source code for the described methodology.
Open Datasets	No	The paper defines a theoretical 'data distribution D' for its model and cites existing datasets in its related work/references, but it does not specify or provide access information for a real-world dataset used for training or empirical validation within the scope of this paper's main findings.
Dataset Splits	No	The paper is theoretical and does not describe experimental validation or dataset splits for training, validation, or testing.
Hardware Specification	No	The paper is theoretical and does not mention any specific hardware used for experiments.
Software Dependencies	No	The paper is theoretical and does not specify any software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and focuses on mathematical proofs and conditions rather than describing an experimental setup with hyperparameters or training details.