reproducibilityindex.ai

Detecting Overfitting via Adversarial Examples

Authors: Roman Werpachowski, András György, Csaba Szepesvari

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We develop a specialized variant of our test for multiclass image classiﬁcation, and apply it to testing overﬁtting of recent models to the popular Image Net benchmark. Our method correctly indicates overﬁtting of the trained model to the training set, but is not able to detect any overﬁtting to the test set, in line with other recent work on this topic. To understand the behavior of our tests better, we ﬁrst use them on a synthetic binary classiﬁcation problem, where the tests are able to successfully identify the cases where overﬁtting is present. Then we apply our independence tests to state-of-the-art classiﬁcation methods for the popular image classiﬁcation benchmark, Image Net [8].
Researcher Affiliation	Industry	Roman Werpachowski András György Csaba Szepesvári Deep Mind, London, UK {romanw,agyorgy,szepi}@google.com
Pseudocode	No	The paper does not contain any pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any statement or link regarding the availability of its source code.
Open Datasets	Yes	We applied our test to check if state-of-the-art classiﬁers for the Image Net dataset [8] have been overﬁtted to the test set. In particular, we use the VGG16 classiﬁer of [27] and the Resnet50 classiﬁer of [16].
Dataset Splits	No	The paper discusses 'training set' and 'test set' but does not explicitly mention 'validation' splits or percentages for any of the datasets used.
Hardware Specification	No	The paper mentions 'computational considerations' and 'computational resources' but does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for the experiments.
Software Dependencies	No	The paper mentions using VGG16 and Resnet50 models, which are deep learning architectures, but it does not specify any software environments, libraries, or their version numbers (e.g., TensorFlow, PyTorch, Python version).
Experiment Setup	Yes	The models were trained using the parameters recommended by their respective authors. The preprocessing procedure of both architectures involves rescaling every image so that the smaller of width and height is 256 and next cropping centrally to size 224 224. To control the amount of change, we limit the magnitude of translations and allow v Vε = {u Z2 : u = (0, 0), u ε} only, for some ﬁxed positive ε. we only analyzed a single trained VGG16 model, while the Resnet50 model was retrained 120 times.