Improve Video Representation with Temporal Adversarial Augmentation
Authors: Jinhao Duan, Quanfu Fan, Hao Cheng, Xiaoshuang Shi, Kaidi Xu
IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate TAF with four powerful models (TSM, GST, TAM, and TPN) over three challenging temporal-related benchmarks (Something-something V1&V2 and diving48). Experimental results demonstrate that TAF effectively improves the test accuracy of these models with notable margins without introducing additional parameters or computational costs. |
| Researcher Affiliation | Collaboration | Jinhao Duan1 , Quanfu Fan2 , Hao Cheng3 , Xiaoshuang Shi4 and Kaidi Xu1 1Drexel University 2Amazon 3The Hong Kong University of Science and Technology (Guangzhou) 4University of Electronic Science and Technology of China |
| Pseudocode | Yes | The pseudo-code of TAF is shown in Appendix A. |
| Open Source Code | Yes | Code is available at https://github.com/jinhaoduan/TAF. |
| Open Datasets | Yes | We evaluate TAF on three popular temporal datasets: Something-something V1&V2 [Goyal et al., 2017], Diving48 [Li et al., 2018b]. |
| Dataset Splits | No | The paper mentions 'top-1 training accuracy vs top-1 validation accuracy' and uses pre-trained models with their initial training settings, implying standard splits. However, it does not explicitly state the specific percentages or sample counts for the training, validation, and test splits used in this paper's experiments. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory configurations. It only mentions 'computational overheads' generally. |
| Software Dependencies | No | The paper does not specify the version numbers for any software dependencies, such as programming languages, libraries, or frameworks used in the experiments. |
| Experiment Setup | Yes | For fine-tuning, we load pre-trained weights and keep training 15 epochs with TAF. We conduct 3 trials for each experiment and report the mean results. The initial training settings (e.g., learning rate, batch size, dropout, etc.) are the same as the status when the pre-trained models are logged. The learning rates are decayed by a factor of 10 after 10 epochs. We set α as 0.7, and the number of attacked frames N as 8 or 16 according to the input temporal length. All the performances reported in this paper are evaluated on 1 center crop and 1 clip, with input resolution 224 224. |