Boosting the Transferability of Video Adversarial Examples via Temporal Translation
Authors: Zhipeng Wei, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang2659-2667
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on the Kinetics-400 dataset and the UCF-101 dataset demonstrate that our method can significantly boost the transferability of video adversarial examples. For transfer-based attack against video recognition models, it achieves a 61.56% average attack success rate on the Kinetics-400 and 48.60% on the UCF-101. |
| Researcher Affiliation | Academia | 1Shanghai Key Lab of Intelligent Information Processing, School of Computer Science, Fudan University 2Shanghai Collaborative Innovation Center on Intelligent Visual Computing zpwei21@m.fudan.edu.cn, {chenjingjing, zxwu, ygj}@fudan.edu.cn |
| Pseudocode | Yes | Algorithm 1: Temporal translation (TT) attack Input: Loss function J, clean video x, ground-truth class y. Parameter: The perturbation budget ϵ, iteration number I, shift L, weight matrix W. Output: The adversarial example. 1: x0 x 2: α ϵ I 3: for i = 0 to I 1 do 4: xi+1 = clipx,ϵ(xi + α g) 5: end for 6: return x I |
| Open Source Code | Yes | Code is available at https://github.com/zhipeng-wei/TT. |
| Open Datasets | Yes | We evaluate our approach using UCF-101 (Soomro, Zamir, and Shah 2012) and Kinetics-400 datasets (Kay et al. 2017), which are widely used datasets for video recognition. |
| Dataset Splits | No | The paper mentions using 'the Kinetics-400 validation dataset' for evaluation, but it does not specify the explicit proportions (e.g., percentages or counts) of the training, validation, and test splits for the datasets (UCF-101 and Kinetics-400) used to train the models, nor does it cite a standard split. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments. It only mentions that models were 'trained on the RGB domain'. |
| Software Dependencies | No | The paper mentions that the models used are 'implemented in https://cv.gluon.ai/model zoo/ action recognition.html', implying the use of GluonCV. However, it does not provide specific version numbers for any software dependencies, such as Python, PyTorch/MXNet, or GluonCV itself. |
| Experiment Setup | Yes | In our experiments, video recognition models with Res Net-101 as its backbone are used as whitebox models for adversarial example generation. We set the maximum perturbation as ϵ = 16 for all experiments. For the iterative attack, we set the iteration number to I = 10, and thus the step size α = 1.6. For our method, the shift length L is set as 7, the weight matrix W is generated with Gaussian function, and the adjacent shift is adopted in the temporal translation. Input clips are formed by randomly cropping out 64 consecutive frames from videos and then skipping every other frame. The spatial size of the input is 224 × 224. |