Benign Overfitting in Two-layer ReLU Convolutional Neural Networks

Authors: Yiwen Kou, Zixiang Chen, Yuanzhou Chen, Quanquan Gu

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our result also reveals a sharp transition between benign and harmful overfitting under different conditions on data distribution in terms of test risk. Experiments on synthetic data back up our theory.
Researcher Affiliation Academia 1Department of Computer Science, University of California, Los Angeles. Correspondence to: Quanquan Gu <qgu@cs.ucla.edu>.
Pseudocode No I did not find any structured pseudocode or algorithm blocks in the paper.
Open Source Code Yes The code for our experiments can be found on Github 1. 1https://github.com/uclaml/Benign Re LU CNN
Open Datasets No Here we generate synthetic data exactly following Definition 1.1. Definition 1.1. Let ยต Rd be a fixed vector representing the signal contained in each data point... is generated from a distribution D, which we specify as follows:... The paper defines a synthetic data generation process rather than using an existing public dataset with concrete access information.
Dataset Splits No I did not find specific information about validation dataset splits. The paper mentions "training data size n = 20" and "estimate the test error for each case using 1000 test data points."
Hardware Specification No I did not find any specific hardware details such as GPU or CPU models, or memory specifications. The paper only states general training parameters for the experiments.
Software Dependencies No We use the default initialization method in Py Torch to initialize the CNN parameters and train the CNN with full-batch gradient descent with a learning rate of 0.1 for 100 iterations. (PyTorch is mentioned, but no version number.)
Experiment Setup Yes The number of filters is set as m = 10. We use the default initialization method in Py Torch to initialize the CNN parameters and train the CNN with full-batch gradient descent with a learning rate of 0.1 for 100 iterations.