Towards Defending against Adversarial Examples via Attack-Invariant Features
Authors: Dawei Zhou, Tongliang Liu, Bo Han, Nannan Wang, Chunlei Peng, Xinbo Gao
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluations demonstrate that our method could provide better protection in comparison to previous state-of-the-art approaches, especially against unseen types of attacks and adaptive attacks. In this section, we first introduce the datasets used in this paper (Section 4.1). We next show and analyze the experimental results of defending against pixel-constrained and spatially-constrained attacks on visual classification tasks, especially against adaptive attacks (Section 4.2). |
| Researcher Affiliation | Academia | 1State Key Laboratory of Integrated Services Networks, School of Telecommunications Engineering, Xidian University 2Trustworthy Machine Learning Lab, School of Computer Science, The University of Sydney 3Department of Computer Science, Hong Kong Baptist University 4State Key Laboratory of Integrated Services Networks, School of Cyber Engineering, Xidian University 5Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications. |
| Pseudocode | Yes | Algorithm 1 ARN: Adversarial Noise Removing Network |
| Open Source Code | Yes | The code is available at https://github.com/dw Davidxd/ARN. |
| Open Datasets | Yes | We verify the effective of our method on three popular benchmark datasets, i.e., MNIST (Le Cun et al., 1998), CIFAR-10 (Krizhevsky et al., 2009), and LISA (Jensen et al., 2016). |
| Dataset Splits | Yes | MNIST and CIFAR-10 both have 10 classes of images, but the former contains 60,000 training images and 10,000 test images, and the latter contains 50,000 training images and 10,000 test images. To alleviate the problem of imbalance and extremely blurry data in LISA, we picked 16 best quality signs with 3,509 training images and 1,148 test images from a subset which contains 47 different U.S. traffic signs (Eykholt et al., 2018; Wu et al., 2020b). |
| Hardware Specification | Yes | For fair comparison, all experiments are conduced on four NVIDIA RTX 2080 GPUs |
| Software Dependencies | No | All methods are implemented by Py Torch. We use the implementation codes of PGD, DDN, CW, JSMA and STA in the advertorch toolbox (Ding et al., 2019) and the implementation codes of RP2, FWA and AA provided by their authors. Specific version numbers for PyTorch or advertorch are not provided. |
| Experiment Setup | Yes | Learning rates for the encoder network, the decoder network, the attack discriminator and the image discriminator are all set to 10 4, with the value of λ1 = 10 1, λ2 = 10 2, θ = 10 1 for MNIST, the value of λ1 = 102, λ2 = 101, θ = 102 for CIFAR-10 and LISA. |