How Benign is Benign Overfitting ?

Authors: Amartya Sanyal, Puneet K. Dokania, Varun Kanade, Philip Torr

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We identify label noise as one of the causes for adversarial vulnerability, and provide theoretical and empirical evidence in support of this. Summary of Theoretical Contributions. Summary of Experimental Contributions.
Researcher Affiliation Collaboration Amartya Sanyal Department of Computer Science, University of Oxford, Oxford, UK The Alan Turing Institute, London, UK amartya.sanyal@cs.ox.ac.uk Varun Kanade Department of Computer Science University of Oxford, Oxford, UK The Alan Turing Institute, London, UK varunk@cs.ox.ac.uk Puneet K.Dokania Department of Engineering Science University of Oxford, Oxford, UK Five AI Limited puneet@robots.ox.ac.uk Philip H.S. Torr Department of Engineering Science University of Oxford, Oxford, UK phst@robots.ox.ac.uk
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets Yes and on the standard datasets: MNIST (Le Cun et al., 1998), CIFAR10 (Krizhevsky & Hinton, 2009), and on a lesser scale Imagenet
Dataset Splits No The paper mentions using training and test data, but does not provide specific numerical train/validation/test splits (e.g., percentages or exact counts) for the datasets used in the main text.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes We train a neural network with four fully connected layers followed by a softmax layer and minimize the cross-entropy loss using an SGD optimizer until the training error becomes zero. Then, we attack this network with a strong ℓ PGD adversary (Madry et al., 2018) with ϵ = 64 255 for 400 steps with a step size of 0.01. The network is optimized with SGD with a batch size of 128, learning rate of 0.1 for 60 epochs and learning rate is decreased to 0.01 after 50epochs.