How Benign is Benign Overfitting ?
Authors: Amartya Sanyal, Puneet K. Dokania, Varun Kanade, Philip Torr
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We identify label noise as one of the causes for adversarial vulnerability, and provide theoretical and empirical evidence in support of this. Summary of Theoretical Contributions. Summary of Experimental Contributions. |
| Researcher Affiliation | Collaboration | Amartya Sanyal Department of Computer Science, University of Oxford, Oxford, UK The Alan Turing Institute, London, UK amartya.sanyal@cs.ox.ac.uk Varun Kanade Department of Computer Science University of Oxford, Oxford, UK The Alan Turing Institute, London, UK varunk@cs.ox.ac.uk Puneet K.Dokania Department of Engineering Science University of Oxford, Oxford, UK Five AI Limited puneet@robots.ox.ac.uk Philip H.S. Torr Department of Engineering Science University of Oxford, Oxford, UK phst@robots.ox.ac.uk |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | and on the standard datasets: MNIST (Le Cun et al., 1998), CIFAR10 (Krizhevsky & Hinton, 2009), and on a lesser scale Imagenet |
| Dataset Splits | No | The paper mentions using training and test data, but does not provide specific numerical train/validation/test splits (e.g., percentages or exact counts) for the datasets used in the main text. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | We train a neural network with four fully connected layers followed by a softmax layer and minimize the cross-entropy loss using an SGD optimizer until the training error becomes zero. Then, we attack this network with a strong ℓ PGD adversary (Madry et al., 2018) with ϵ = 64 255 for 400 steps with a step size of 0.01. The network is optimized with SGD with a batch size of 128, learning rate of 0.1 for 60 epochs and learning rate is decreased to 0.01 after 50epochs. |