The Limitations of Adversarial Training and the Blind-Spot Attack

Authors: Huan Zhang*, Hongge Chen*, Zhao Song, Duane Boning, Inderjit S. Dhillon, Cho-Jui Hsieh

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on adversarially trained models by Madry et al. (2018) on four datasets: MNIST, Fashion MNIST, and CIFAR-10. For MNIST, we use the secret model release for the MNIST attack challenge3. For CIFAR-10, we use the public adversarially trained model4. For Fashion MNIST, we train our own model with the same model structure and parameters as the robust MNIST model, except that the iterative adversary is allowed to perturb each pixel by at most ϵ = 0.1 as a larger ϵ will significantly reduce model accuracy.
Researcher Affiliation Academia 1UCLA, Los Angeles, CA 90095 2MIT, Cambridge, MA 02139 3UT Austin, Austin, TX 78712
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code No The paper references third-party GitHub repositories for models used (e.g., Madry Lab's models) but does not provide a link to the authors' own source code for the methodology or blind-spot attack implementation.
Open Datasets Yes We conduct experiments on adversarially trained models by Madry et al. (2018) on four datasets: MNIST, Fashion MNIST, and CIFAR-10. ... We also studied the German Traffic Sign (GTS) (Houben et al., 2013) dataset.
Dataset Splits No The paper mentions using 'training data' and 'test set' but does not specify explicit training/validation/test splits (e.g., percentages or sample counts) for reproducibility, especially for the models they train themselves.
Hardware Specification No The paper does not mention any specific hardware (e.g., GPU, CPU models) used for running the experiments.
Software Dependencies No The paper mentions techniques like C&W attacks and PGD but does not list any specific software or library names with version numbers used for implementation (e.g., 'PyTorch 1.9', 'TensorFlow 2.0').
Experiment Setup Yes For Fashion MNIST, we train our own model with the same model structure and parameters as the robust MNIST model, except that the iterative adversary is allowed to perturb each pixel by at most ϵ = 0.1 as a larger ϵ will significantly reduce model accuracy. ... We use our presented simple blind-spot attack in Section 3.3 to find blind-spot images, and use Carlini & Wagner s (C&W s) ℓ attack (Carlini & Wagner (2017)) to find their adversarial examples. ... For MNIST, ϵ = 0.3; for Fashion-MNIST, ϵ = 0.1; and for CIFAR, ϵ = 8/255. All input images are normalized to [ 0.5, 0.5].