Benign, Tempered, or Catastrophic: Toward a Refined Taxonomy of Overfitting

Authors: Neil Mallinar, James Simon, Amirhesam Abedsoltan, Parthe Pandit, Misha Belkin, Preetum Nakkiran

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We then empirically study deep neural networks through the lens of our taxonomy, and find that those trained to interpolation are tempered, while those stopped early are benign. We hope our work leads to a more refined understanding of overfitting in modern learning. In Section 4, we empirically study overfitting for DNNs. We give evidence that standard DNNs trained to interpolation exhibit tempered overfitting, not benign overfitting, motivating the further study of tempered overfitting in the pursuit of understanding modern machine learning methods.
Researcher Affiliation Collaboration Neil Mallinar UC San Diego nmallina@ucsd.edu James B. Simon UC Berkeley james.simon@berkeley.edu Amirhesam Abedsoltan UC San Diego aabedsoltan@ucsd.edu Parthe Pandit UC San Diego parthepandit@ucsd.edu Mikhail Belkin UC San Diego mbelkin@ucsd.edu Preetum Nakkiran Apple & UC San Diego preetum@apple.com
Pseudocode No No pseudocode or clearly labeled algorithm blocks were found in the paper.
Open Source Code No The paper does not provide an explicit statement about releasing source code for the described methodology, nor does it include a link to a code repository.
Open Datasets Yes Figure 2 depicts such an experiment: a Res Net is trained on a binary variant of CIFAR-10 with varying amounts of training label noise, and with increasing sample size n. In Figure 3 we demonstrate our taxonomy experimentally for two benign methods (k-NN and earlystopped MLPs) and two tempered methods (1-NN and interpolating MLPs) on a binary classification version of MNIST, with varying noise in the train labels. Figure 9 shows noise profiles for Wide Res Nets trained on a binary version of SVHN. The paper cites Krizhevsky et al. [2009] for CIFAR-10, Yann Le Cun [1998] for MNIST, and Yuval Netzer et al. [2011] for SVHN, which are all standard public datasets.
Dataset Splits No The paper mentions 'train set' and 'test set' but does not explicitly provide details about a 'validation set' or specific train/validation/test dataset splits (e.g., percentages or sample counts) needed for reproduction. It refers to 'clean test set' but no explicit validation split.
Hardware Specification Yes This work used the Extreme Science and Engineering Discovery Environment (XSEDE) [Towns et al., 2014], which is supported by NSF grant number ACI-1548562, Expanse CPU/GPU compute nodes, and allocations TG-CIS210104 and TG-CIS220009.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., library names with specific versions like PyTorch 1.9 or TensorFlow 2.x) needed to replicate the experiments.
Experiment Setup Yes We provide full experimental details in Appendix C. Appendix C.2 Training Details: All neural networks in this paper are trained using the Adam optimizer [Kingma and Ba, 2015] with a batch size of 500, a learning rate of 0.001, and a weight decay of 0.0001. All networks are trained for 10,000 epochs or until training loss reached 10−5.