Mutual-Modality Adversarial Attack with Semantic Perturbation

Authors: Jingwen Ye, Ruonan Yu, Songhua Liu, Xinchao Wang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach on several benchmark datasets and demonstrate that our mutual-modal attack strategy can effectively produce high-transferable attacks, which are stable regardless of the target networks. Our approach outperforms state-of-the-art attack methods and can be readily deployed as a plug-and-play solution.
Researcher Affiliation Academia Jingwen Ye, Ruonan Yu, Songhua Liu, Xinchao Wang National University of Singapore jingweny@nus.edu.sg, {ruonan,songhua.liu}@u.nus.sg, xinchao@nus.edu.sg
Pseudocode No with the iterative training of G and P, we obtain the final generative perturbation network G, the whole algorithm is given in the supplementary.
Open Source Code No The paper does not include an unambiguous statement of code release or a direct link to a source code repository for the methodology described.
Open Datasets Yes We evaluate attacks using two popular datasets in adversarial examples research, which are the CIFAR-10 dataset (Krizhevsky 2009) and the Image Net dataset (Russakovsky et al. 2014).
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, and testing. While Table 1 mentions 'Train/ Val', the specific proportions are not defined.
Hardware Specification Yes A total of 10 iterations (we set the NUMG to be 2) are used to train the whole network, which costs about 8 hours on one NVIDIA Ge Force RTX 3090 GPUs.
Software Dependencies No The paper mentions 'We used Py Torch framework for the implementation' but does not specify its version number or any other software dependencies with their respective versions.
Experiment Setup Yes We used Py Torch framework for the implementation. In the normal setting of using the pre-trained CLIP as the surrogate model, we choose the Vi T/32 as backbone. As for the generator, we choose to use the Res Net backbone, and set the learning rate to be 0.0001 with Adam optimizer. All images are scaled to 224 224 to train the generator. For the ℓ bound, we set ϵ = 0.04. A total of 10 iterations (we set the NUMG to be 2) are used to train the whole network...