reproducibilityindex.ai

Improve Video Representation with Temporal Adversarial Augmentation

Authors: Jinhao Duan, Quanfu Fan, Hao Cheng, Xiaoshuang Shi, Kaidi Xu

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate TAF with four powerful models (TSM, GST, TAM, and TPN) over three challenging temporal-related benchmarks (Something-something V1&V2 and diving48). Experimental results demonstrate that TAF effectively improves the test accuracy of these models with notable margins without introducing additional parameters or computational costs.
Researcher Affiliation	Collaboration	Jinhao Duan1 , Quanfu Fan2 , Hao Cheng3 , Xiaoshuang Shi4 and Kaidi Xu1 1Drexel University 2Amazon 3The Hong Kong University of Science and Technology (Guangzhou) 4University of Electronic Science and Technology of China
Pseudocode	Yes	The pseudo-code of TAF is shown in Appendix A.
Open Source Code	Yes	Code is available at https://github.com/jinhaoduan/TAF.
Open Datasets	Yes	We evaluate TAF on three popular temporal datasets: Something-something V1&V2 [Goyal et al., 2017], Diving48 [Li et al., 2018b].
Dataset Splits	No	The paper mentions 'top-1 training accuracy vs top-1 validation accuracy' and uses pre-trained models with their initial training settings, implying standard splits. However, it does not explicitly state the specific percentages or sample counts for the training, validation, and test splits used in this paper's experiments.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory configurations. It only mentions 'computational overheads' generally.
Software Dependencies	No	The paper does not specify the version numbers for any software dependencies, such as programming languages, libraries, or frameworks used in the experiments.
Experiment Setup	Yes	For fine-tuning, we load pre-trained weights and keep training 15 epochs with TAF. We conduct 3 trials for each experiment and report the mean results. The initial training settings (e.g., learning rate, batch size, dropout, etc.) are the same as the status when the pre-trained models are logged. The learning rates are decayed by a factor of 10 after 10 epochs. We set α as 0.7, and the number of attacked frames N as 8 or 16 according to the input temporal length. All the performances reported in this paper are evaluated on 1 center crop and 1 clip, with input resolution 224 224.