Sparse Adversarial Perturbations for Videos
Authors: Xingxing Wei, Jun Zhu, Sha Yuan, Hang Su8973-8980
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on the UCF101 dataset demonstrate that even only one frame in a video is perturbed, the fooling rate can still reach 59.7%.We choose the widely used dataset in action recognition: UCF101 (Soomro, Zamir, and Shah 2012).Metrics: We use three metrics to evaluate various aspects. Fooling ratio (F): is defined as the percentage of adversarial videos that are successfully misclassified (Moosavi Dezfooli et al. 2016). Perceptibility (P): denotes the perceptibility score of the adversarial perturbation r. We here use the Mean Absolute Perturbation (MAP): P = 1 N P i |ri|, where N is the number of pixels, and ri is the intensity vector (3dimensional in the RGB color space). Sparsity (S): denotes the proportion of frames with no perturbations (clean frames) versus all the frames in a specific video to fool DNNs. |
| Researcher Affiliation | Academia | Xingxing Wei, Jun Zhu, Sha Yuan, Hang Su Dept. of Comp. Sci. & Tech., Institute for Artificial Intelligence, State Key Lab for Intell. Tech. & Sys., THBI Lab, Tsinghua University, Beijing, China {xwei11, dcszj, yuansha, suhangss}@mail.tsinghua.edu.cn |
| Pseudocode | No | The paper presents optimization problems (Eq. 1-4) and discusses algorithms like Adam, but it does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement about releasing source code or a link to a code repository for the methodology described. |
| Open Datasets | Yes | We choose the widely used dataset in action recognition: UCF101 (Soomro, Zamir, and Shah 2012). |
| Dataset Splits | No | It contains 13,320 videos with 101 action classes covering a broad set of activities such as sports, musical instruments, body-motion, human-human interaction, human-object interaction. The dataset splits more than 8000 videos in the training set, and more than 3000 videos in the testing set. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using the 'Adam (Kingma and Ba 2014) algorithm' but does not specify any software names with version numbers for libraries, frameworks, or programming languages used in the implementation. |
| Experiment Setup | Yes | λ in problem (3,4) is set to a constant, which is tuned in the training set. Because l2,1 norm is used, initializing the perturbations with zeros will lead to Na N values. We instead initialize them using a small value. In the experiments, we use 0.0001.In default, we set N = 20 and use l2 norm in the following experiments. |