Global-Local Characteristic Excited Cross-Modal Attacks from Images to Videos

Authors: Ruikui Wang, Yuanfang Guo, Yunhong Wang

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on the UCF-101 and Kinetics-400 validate the proposed method significantly improves cross-modal transferability and even surpasses stronger baseline using video models as substitute model.
Researcher Affiliation Academia 1School of Computer Science and Engineering, Beihang University, China 2Zhongguancun Laboratory, Beijing, China {rkwang, andyguo, yhwang}@buaa.edu.cn
Pseudocode Yes Algorithm 1: Global-Local Characteristic Excited Cross Modal Attack.
Open Source Code Yes Our source codes are available at https://github.com/lwmming/Cross-Modal-Attack.
Open Datasets Yes Two video recognition datasets, UCF-101 (Soomro, Zamir, and Shah 2012) and Kinetics-400 (Carreira and Zisserman 2017), are used for evaluations. Image Net-pretrained image models.
Dataset Splits No The paper does not explicitly provide details about train/validation/test dataset splits, or how validation was performed for model training. It mentions 'evaluations' and 'Attack Success Rate', but not specific data splits for validation during training or hyperparameter tuning.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions models like Alex Net, Res Net, Squeeze Net, VGG, TPN, Slow Fast, but does not specify software versions (e.g., Python, PyTorch, CUDA versions) used for implementation.
Experiment Setup Yes For optimization strategy, we set the maximum perturbations ϵ as 16.0, step size α as 0.005, number of iterations I as 60, λ in Eq. 5 as 0.01. For the intermediate layer l in Eq. 3, we select feature.7 for Alex Net, layer2 for Res Net101, features.6.expand3 3activation for Squeeze Net and features.20 for VGG-16, which is consistent with I2V. In practice, we set n1 as 2 and n2 as 3.