From Tempered to Benign Overfitting in ReLU Neural Networks

Authors: Guy Kornowski, Gilad Yehudai, Ohad Shamir

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical study for intermediate dimensions (Section 5). Following our theoretical results, we attempt at empirically bridging the gap between the one-dimensional and high dimensional settings. In particular, it appears that the tempered and overfitting behavior extends to a wider regime than what our theoretical results formally cover, and that the overfitting profile gradually shifts from tempered to benign as the dimension increases. This substantially extends the prior empirical observation due to Mallinar et al. [2022, Figure 6b], which exhibited tempered overfitting in input dimension 10. In our experiments, we trained a fully connected neural network with 2 or 3 layers, width n (which will be specified later) and Re LU activations. We sampled a dataset (xi, yi)m i=1 Unif(Sd 1) Dy for noise level p {0.05, 0.1, . . . , 0.5}. We trained the network using SGD with a constant learning rate of 0.1, and with the logistic (cross-entropy) loss. Each experiment ran for a total of 20k epochs, and was repeated 10 times with different random seeds, the plots are averaged over the runs.
Researcher Affiliation Academia Guy Kornowski Weizmann Institute of Science guy.kornowski@weizmann.ac.il Gilad Yehudai Weizmann Institute of Science gilad.yehudai@weizmann.ac.il Ohad Shamir Weizmann Institute of Science ohad.shamir@weizmann.ac.il
Pseudocode No The paper contains mathematical derivations and proofs but no explicit pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any statement about making its source code publicly available, nor does it provide a link to a code repository.
Open Datasets No We sampled a dataset (xi, yi)m i=1 Unif(Sd 1) Dy for noise level p {0.05, 0.1, . . . , 0.5}.
Dataset Splits No The paper describes training on a sampled dataset and mentions that the trained network correctly classified all training data, but it does not specify any explicit training/validation/test dataset splits, sample counts for splits, or cross-validation setup.
Hardware Specification No The paper describes training neural networks and running experiments but does not provide any specific details about the hardware used, such as GPU models, CPU models, or cloud resources.
Software Dependencies No The paper mentions using SGD and logistic loss for training, but it does not specify any software dependencies with version numbers, such as programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes In our experiments, we trained a fully connected neural network with 2 or 3 layers, width n (which will be specified later) and Re LU activations. We sampled a dataset (xi, yi)m i=1 Unif(Sd 1) Dy for noise level p {0.05, 0.1, . . . , 0.5}. We trained the network using SGD with a constant learning rate of 0.1, and with the logistic (cross-entropy) loss. Each experiment ran for a total of 20k epochs, and was repeated 10 times with different random seeds, the plots are averaged over the runs.