TG-VQA: Ternary Game of Video Question Answering

Authors: Hao Li, Peng Jin, Zesen Cheng, Songyang Zhang, Kai Chen, Zhennan Wang, Chang Liu, Jie Chen

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments to validate our model on three Video QA datasets, including MSVD-QA, MSRVTTQA, and Activity Net-QA. The empirical results and ablative studies show our method consistently achieves significant improvements(more than 5%) on all benchmarks.
Researcher Affiliation Collaboration 1School of Electronic and Computer Engineering, Peking University, Shenzhen, China 2Shanghai AI Laboratory, Shanghai, China 3AI for Science (AI4S)-Preferred Program, Peking University Shenzhen Graduate School 4Peng Cheng Laboratory, Shenzhen, China 5Department of Automation and BNRist, Tsinghua University
Pseudocode No The paper describes its model architecture and components in text and diagrams, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code No The paper does not contain an explicit statement offering the source code for the described methodology or a link to a code repository.
Open Datasets Yes We select multiple Video QA datasets to comprehensively evaluate the effectiveness of our method on different-length videos. Following the VQA-T [Yang et al., 2022a] setting, we choose two short video datasets (MSVD-QA [Xu et al., 2017], MSRVTT-QA [Xu et al., 2017]) and one long video dataset (Activity Net-QA [Yu et al., 2019]) as our evaluation benchmarks.
Dataset Splits No We select multiple Video QA datasets to comprehensively evaluate the effectiveness of our method on different-length videos. Following the VQA-T [Yang et al., 2022a] setting, we choose two short video datasets (MSVD-QA, MSRVTT-QA) and one long video dataset (Activity Net-QA) as our evaluation benchmarks. (The paper states the datasets used and mentions following VQA-T's setting, but does not provide the specific split details within this paper.)
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions using ViT and BERT as backbones but does not provide specific version numbers for any software dependencies, libraries, or frameworks used in the experiments.
Experiment Setup Yes In order to explore the effect of the ternary game loss LT G s hyperparameter on the performance of the model, we train our TG-VQA on the MSRVTT-QA dataset with hyperparameter α from 0.1 to 1.5. Shown in Figure4 (a), the model performance fluctuates in range [45.1, 46.3]. When α = 0.5, our model performs best.