Temporal-Distributed Backdoor Attack against Video Based Action Recognition

Authors: Xi Li, Songhe Wang, Ruiquan Huang, Mahanth Gowda, George Kesidis

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The effectiveness of the proposed attack is demonstrated by extensive experiments with various well-known models on two video recognition benchmarks, UCF101 and HMDB51, and a sign language recognition benchmark, Greek Sign Language (GSL) dataset. We delve into the impact of several influential factors on our proposed attack and identify an intriguing effect termed collateral damage through extensive studies.
Researcher Affiliation Academia The Pennsylvania State University {xzl45, sxw5765, rzh5514, mkg31, gik2}@psu.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statements or links indicating the availability of open-source code for the methodology described.
Open Datasets Yes Datasets: We consider two benchmark datasets used in video action recognition, UCF-101 (Soomro, Zamir, and Shah 2012) and HMDB-51 (Kuehne et al. 2011), and a sign language recognition benchmark, Greek Sign Language (GSL) dataset (Adaloglou et al. 2022).
Dataset Splits No The paper does not explicitly state the training, validation, and test dataset splits with percentages or counts for their experiments. While it mentions "test set" and "training samples", a validation split is not specified.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU specifications, or cloud computing instance types.
Software Dependencies No The paper mentions using "Adam W optimizer (Loshchilov and Hutter 2019)" but does not specify version numbers for any programming languages, libraries, or other software dependencies necessary for replication.
Experiment Setup Yes Training Settings: We train all the models on all the datasets for 10 epochs, using the Adam W optimizer (Loshchilov and Hutter 2019) with an initial learning rate of 0.0003. Following the common training strategy in video recognition (Hammoud et al. 2023) and for reducing computation cost, we down-sample the videos into 32 frames.