The Limitations of Adversarial Training and the Blind-Spot Attack
Authors: Huan Zhang*, Hongge Chen*, Zhao Song, Duane Boning, Inderjit S. Dhillon, Cho-Jui Hsieh
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on adversarially trained models by Madry et al. (2018) on four datasets: MNIST, Fashion MNIST, and CIFAR-10. For MNIST, we use the secret model release for the MNIST attack challenge3. For CIFAR-10, we use the public adversarially trained model4. For Fashion MNIST, we train our own model with the same model structure and parameters as the robust MNIST model, except that the iterative adversary is allowed to perturb each pixel by at most ϵ = 0.1 as a larger ϵ will significantly reduce model accuracy. |
| Researcher Affiliation | Academia | 1UCLA, Los Angeles, CA 90095 2MIT, Cambridge, MA 02139 3UT Austin, Austin, TX 78712 |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper references third-party GitHub repositories for models used (e.g., Madry Lab's models) but does not provide a link to the authors' own source code for the methodology or blind-spot attack implementation. |
| Open Datasets | Yes | We conduct experiments on adversarially trained models by Madry et al. (2018) on four datasets: MNIST, Fashion MNIST, and CIFAR-10. ... We also studied the German Traffic Sign (GTS) (Houben et al., 2013) dataset. |
| Dataset Splits | No | The paper mentions using 'training data' and 'test set' but does not specify explicit training/validation/test splits (e.g., percentages or sample counts) for reproducibility, especially for the models they train themselves. |
| Hardware Specification | No | The paper does not mention any specific hardware (e.g., GPU, CPU models) used for running the experiments. |
| Software Dependencies | No | The paper mentions techniques like C&W attacks and PGD but does not list any specific software or library names with version numbers used for implementation (e.g., 'PyTorch 1.9', 'TensorFlow 2.0'). |
| Experiment Setup | Yes | For Fashion MNIST, we train our own model with the same model structure and parameters as the robust MNIST model, except that the iterative adversary is allowed to perturb each pixel by at most ϵ = 0.1 as a larger ϵ will significantly reduce model accuracy. ... We use our presented simple blind-spot attack in Section 3.3 to find blind-spot images, and use Carlini & Wagner s (C&W s) ℓ attack (Carlini & Wagner (2017)) to find their adversarial examples. ... For MNIST, ϵ = 0.3; for Fashion-MNIST, ϵ = 0.1; and for CIFAR, ϵ = 8/255. All input images are normalized to [ 0.5, 0.5]. |