TG-VQA: Ternary Game of Video Question Answering
Authors: Hao Li, Peng Jin, Zesen Cheng, Songyang Zhang, Kai Chen, Zhennan Wang, Chang Liu, Jie Chen
IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments to validate our model on three Video QA datasets, including MSVD-QA, MSRVTTQA, and Activity Net-QA. The empirical results and ablative studies show our method consistently achieves significant improvements(more than 5%) on all benchmarks. |
| Researcher Affiliation | Collaboration | 1School of Electronic and Computer Engineering, Peking University, Shenzhen, China 2Shanghai AI Laboratory, Shanghai, China 3AI for Science (AI4S)-Preferred Program, Peking University Shenzhen Graduate School 4Peng Cheng Laboratory, Shenzhen, China 5Department of Automation and BNRist, Tsinghua University |
| Pseudocode | No | The paper describes its model architecture and components in text and diagrams, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | No | The paper does not contain an explicit statement offering the source code for the described methodology or a link to a code repository. |
| Open Datasets | Yes | We select multiple Video QA datasets to comprehensively evaluate the effectiveness of our method on different-length videos. Following the VQA-T [Yang et al., 2022a] setting, we choose two short video datasets (MSVD-QA [Xu et al., 2017], MSRVTT-QA [Xu et al., 2017]) and one long video dataset (Activity Net-QA [Yu et al., 2019]) as our evaluation benchmarks. |
| Dataset Splits | No | We select multiple Video QA datasets to comprehensively evaluate the effectiveness of our method on different-length videos. Following the VQA-T [Yang et al., 2022a] setting, we choose two short video datasets (MSVD-QA, MSRVTT-QA) and one long video dataset (Activity Net-QA) as our evaluation benchmarks. (The paper states the datasets used and mentions following VQA-T's setting, but does not provide the specific split details within this paper.) |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using ViT and BERT as backbones but does not provide specific version numbers for any software dependencies, libraries, or frameworks used in the experiments. |
| Experiment Setup | Yes | In order to explore the effect of the ternary game loss LT G s hyperparameter on the performance of the model, we train our TG-VQA on the MSRVTT-QA dataset with hyperparameter α from 0.1 to 1.5. Shown in Figure4 (a), the model performance fluctuates in range [45.1, 46.3]. When α = 0.5, our model performs best. |