Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
TG-VQA: Ternary Game of Video Question Answering
Authors: Hao Li, Peng Jin, Zesen Cheng, Songyang Zhang, Kai Chen, Zhennan Wang, Chang Liu, Jie Chen
IJCAI 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments to validate our model on three Video QA datasets, including MSVD-QA, MSRVTTQA, and Activity Net-QA. The empirical results and ablative studies show our method consistently achieves significant improvements(more than 5%) on all benchmarks. |
| Researcher Affiliation | Collaboration | 1School of Electronic and Computer Engineering, Peking University, Shenzhen, China 2Shanghai AI Laboratory, Shanghai, China 3AI for Science (AI4S)-Preferred Program, Peking University Shenzhen Graduate School 4Peng Cheng Laboratory, Shenzhen, China 5Department of Automation and BNRist, Tsinghua University |
| Pseudocode | No | The paper describes its model architecture and components in text and diagrams, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | No | The paper does not contain an explicit statement offering the source code for the described methodology or a link to a code repository. |
| Open Datasets | Yes | We select multiple Video QA datasets to comprehensively evaluate the effectiveness of our method on different-length videos. Following the VQA-T [Yang et al., 2022a] setting, we choose two short video datasets (MSVD-QA [Xu et al., 2017], MSRVTT-QA [Xu et al., 2017]) and one long video dataset (Activity Net-QA [Yu et al., 2019]) as our evaluation benchmarks. |
| Dataset Splits | No | We select multiple Video QA datasets to comprehensively evaluate the effectiveness of our method on different-length videos. Following the VQA-T [Yang et al., 2022a] setting, we choose two short video datasets (MSVD-QA, MSRVTT-QA) and one long video dataset (Activity Net-QA) as our evaluation benchmarks. (The paper states the datasets used and mentions following VQA-T's setting, but does not provide the specific split details within this paper.) |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using ViT and BERT as backbones but does not provide specific version numbers for any software dependencies, libraries, or frameworks used in the experiments. |
| Experiment Setup | Yes | In order to explore the effect of the ternary game loss LT G s hyperparameter on the performance of the model, we train our TG-VQA on the MSRVTT-QA dataset with hyperparameter α from 0.1 to 1.5. Shown in Figure4 (a), the model performance fluctuates in range [45.1, 46.3]. When α = 0.5, our model performs best. |