On the Vulnerability of Adversarially Trained Models Against Two-faced Attacks

Authors: Shengjie Zhou, Lue Tao, Yuzhou Cao, Tao Xiang, Bo An, Lei Feng

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To provide a comprehensive study on two-faced attacks, we verify the adversarial robustness of various network architectures (e.g., Res Net-18 and Wide Res Net-28-10) on multiple benchmark datasets including CIFAR-10, SVHN (Netzer et al., 2011), CIFAR-100 (Krizhevsky, 2009), and Tiny-Image Net (Yao et al., 2015), employing different adversarial training methods including PGDAT (Madry et al., 2018), TRADES (Zhang et al., 2019), MART (Wang et al., 2020b), FAT (Zhang et al., 2020), and THRM (Tao et al., 2022b). In addition, we validate various off-the-shelf adversarially trained models from Robust Bench (Croce et al., 2021), with diverse architectures and training strategies. We also evaluate the threat of two-faced attacks against different robustness verification methords and examine the transferability of two-faced attacks. All experimental results consistently show that these models exhibit higher robustness under two-faced attacks compared with their actual robustness, which demonstrates that adversarially trained models are indeed vulnerable to two-faced attacks, and such attacks may be widespread.
Researcher Affiliation Collaboration 1Chongqing University 2Nanjing University 3Nanyang Technological University 4Skywork AI
Pseudocode Yes Algorithm 1 Two-Faced Examples Generation
Open Source Code No The paper does not explicitly state that source code for their methodology is provided or publicly available.
Open Datasets Yes We verify the adversarial robustness of various network architectures (e.g., Res Net-18 and Wide Res Net-28-10) on multiple benchmark datasets including CIFAR-10, SVHN (Netzer et al., 2011), CIFAR-100 (Krizhevsky, 2009), and Tiny-Image Net (Yao et al., 2015)...
Dataset Splits Yes CIFAR-10 (Krizhevsky, 2009) is a widely used image dataset in computer vision research. It consists of 60,000 32 32 pixel color images (50,000 images for training and 10,000 images for testing)... We use the original test data set as the Clean validation data.
Hardware Specification Yes We conduct all experiments using NVIDIA GeForce RTX 3090 GPUs.
Software Dependencies No The paper mentions using 'PyTorch' but does not provide specific version numbers for software dependencies.
Experiment Setup Yes We train Res Net-18 (He et al., 2015), Dense Net-121 (Huang et al., 2016), and Wide Res Net-28-10 (Zagoruyko & Komodakis, 2016) models using SGD with a learning rate of 0.1, momentum of 0.9, and weight decay of 5 10 4. Additionally, the MLP and VGG-16 (Simonyan & Zisserman, 2014) models are trained with SGD using a learning rate of 0.01, momentum of 0.9, and weight decay of 5 10 4. For all architectures, the training epoch is fixed at 110 with batch size 128 and learning rate was decayed by a factor of 0.1 in the 100-th epoch and the 105-th epoch respectively.