reproducibilityindex.ai

From Tempered to Benign Overfitting in ReLU Neural Networks

Authors: Guy Kornowski, Gilad Yehudai, Ohad Shamir

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical study for intermediate dimensions (Section 5). Following our theoretical results, we attempt at empirically bridging the gap between the one-dimensional and high dimensional settings. In particular, it appears that the tempered and overfitting behavior extends to a wider regime than what our theoretical results formally cover, and that the overfitting profile gradually shifts from tempered to benign as the dimension increases. This substantially extends the prior empirical observation due to Mallinar et al. [2022, Figure 6b], which exhibited tempered overfitting in input dimension 10. In our experiments, we trained a fully connected neural network with 2 or 3 layers, width n (which will be specified later) and Re LU activations. We sampled a dataset (xi, yi)m i=1 Unif(Sd 1) Dy for noise level p {0.05, 0.1, . . . , 0.5}. We trained the network using SGD with a constant learning rate of 0.1, and with the logistic (cross-entropy) loss. Each experiment ran for a total of 20k epochs, and was repeated 10 times with different random seeds, the plots are averaged over the runs.
Researcher Affiliation	Academia	Guy Kornowski Weizmann Institute of Science guy.kornowski@weizmann.ac.il Gilad Yehudai Weizmann Institute of Science gilad.yehudai@weizmann.ac.il Ohad Shamir Weizmann Institute of Science ohad.shamir@weizmann.ac.il
Pseudocode	No	The paper contains mathematical derivations and proofs but no explicit pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any statement about making its source code publicly available, nor does it provide a link to a code repository.
Open Datasets	No	We sampled a dataset (xi, yi)m i=1 Unif(Sd 1) Dy for noise level p {0.05, 0.1, . . . , 0.5}.
Dataset Splits	No	The paper describes training on a sampled dataset and mentions that the trained network correctly classified all training data, but it does not specify any explicit training/validation/test dataset splits, sample counts for splits, or cross-validation setup.
Hardware Specification	No	The paper describes training neural networks and running experiments but does not provide any specific details about the hardware used, such as GPU models, CPU models, or cloud resources.
Software Dependencies	No	The paper mentions using SGD and logistic loss for training, but it does not specify any software dependencies with version numbers, such as programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	In our experiments, we trained a fully connected neural network with 2 or 3 layers, width n (which will be specified later) and Re LU activations. We sampled a dataset (xi, yi)m i=1 Unif(Sd 1) Dy for noise level p {0.05, 0.1, . . . , 0.5}. We trained the network using SGD with a constant learning rate of 0.1, and with the logistic (cross-entropy) loss. Each experiment ran for a total of 20k epochs, and was repeated 10 times with different random seeds, the plots are averaged over the runs.