reproducibilityindex.ai

Benign, Tempered, or Catastrophic: Toward a Refined Taxonomy of Overfitting

Authors: Neil Mallinar, James Simon, Amirhesam Abedsoltan, Parthe Pandit, Misha Belkin, Preetum Nakkiran

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We then empirically study deep neural networks through the lens of our taxonomy, and ﬁnd that those trained to interpolation are tempered, while those stopped early are benign. We hope our work leads to a more reﬁned understanding of overﬁtting in modern learning. In Section 4, we empirically study overﬁtting for DNNs. We give evidence that standard DNNs trained to interpolation exhibit tempered overﬁtting, not benign overﬁtting, motivating the further study of tempered overﬁtting in the pursuit of understanding modern machine learning methods.
Researcher Affiliation	Collaboration	Neil Mallinar UC San Diego nmallina@ucsd.edu James B. Simon UC Berkeley james.simon@berkeley.edu Amirhesam Abedsoltan UC San Diego aabedsoltan@ucsd.edu Parthe Pandit UC San Diego parthepandit@ucsd.edu Mikhail Belkin UC San Diego mbelkin@ucsd.edu Preetum Nakkiran Apple & UC San Diego preetum@apple.com
Pseudocode	No	No pseudocode or clearly labeled algorithm blocks were found in the paper.
Open Source Code	No	The paper does not provide an explicit statement about releasing source code for the described methodology, nor does it include a link to a code repository.
Open Datasets	Yes	Figure 2 depicts such an experiment: a Res Net is trained on a binary variant of CIFAR-10 with varying amounts of training label noise, and with increasing sample size n. In Figure 3 we demonstrate our taxonomy experimentally for two benign methods (k-NN and earlystopped MLPs) and two tempered methods (1-NN and interpolating MLPs) on a binary classiﬁcation version of MNIST, with varying noise in the train labels. Figure 9 shows noise proﬁles for Wide Res Nets trained on a binary version of SVHN. The paper cites Krizhevsky et al. [2009] for CIFAR-10, Yann Le Cun [1998] for MNIST, and Yuval Netzer et al. [2011] for SVHN, which are all standard public datasets.
Dataset Splits	No	The paper mentions 'train set' and 'test set' but does not explicitly provide details about a 'validation set' or specific train/validation/test dataset splits (e.g., percentages or sample counts) needed for reproduction. It refers to 'clean test set' but no explicit validation split.
Hardware Specification	Yes	This work used the Extreme Science and Engineering Discovery Environment (XSEDE) [Towns et al., 2014], which is supported by NSF grant number ACI-1548562, Expanse CPU/GPU compute nodes, and allocations TG-CIS210104 and TG-CIS220009.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., library names with specific versions like PyTorch 1.9 or TensorFlow 2.x) needed to replicate the experiments.
Experiment Setup	Yes	We provide full experimental details in Appendix C. Appendix C.2 Training Details: All neural networks in this paper are trained using the Adam optimizer [Kingma and Ba, 2015] with a batch size of 500, a learning rate of 0.001, and a weight decay of 0.0001. All networks are trained for 10,000 epochs or until training loss reached 10−5.