Detecting Overfitting via Adversarial Examples

Authors: Roman Werpachowski, András György, Csaba Szepesvari

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We develop a specialized variant of our test for multiclass image classification, and apply it to testing overfitting of recent models to the popular Image Net benchmark. Our method correctly indicates overfitting of the trained model to the training set, but is not able to detect any overfitting to the test set, in line with other recent work on this topic. To understand the behavior of our tests better, we first use them on a synthetic binary classification problem, where the tests are able to successfully identify the cases where overfitting is present. Then we apply our independence tests to state-of-the-art classification methods for the popular image classification benchmark, Image Net [8].
Researcher Affiliation Industry Roman Werpachowski András György Csaba Szepesvári Deep Mind, London, UK {romanw,agyorgy,szepi}@google.com
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement or link regarding the availability of its source code.
Open Datasets Yes We applied our test to check if state-of-the-art classifiers for the Image Net dataset [8] have been overfitted to the test set. In particular, we use the VGG16 classifier of [27] and the Resnet50 classifier of [16].
Dataset Splits No The paper discusses 'training set' and 'test set' but does not explicitly mention 'validation' splits or percentages for any of the datasets used.
Hardware Specification No The paper mentions 'computational considerations' and 'computational resources' but does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for the experiments.
Software Dependencies No The paper mentions using VGG16 and Resnet50 models, which are deep learning architectures, but it does not specify any software environments, libraries, or their version numbers (e.g., TensorFlow, PyTorch, Python version).
Experiment Setup Yes The models were trained using the parameters recommended by their respective authors. The preprocessing procedure of both architectures involves rescaling every image so that the smaller of width and height is 256 and next cropping centrally to size 224 224. To control the amount of change, we limit the magnitude of translations and allow v Vε = {u Z2 : u = (0, 0), u ε} only, for some fixed positive ε. we only analyzed a single trained VGG16 model, while the Resnet50 model was retrained 120 times.