Inferential Knowledge-Enhanced Integrated Reasoning for Video Question Answering

Authors: Jianguo Mao, Wenbin Jiang, Hong Liu, Xiangdong Wang, Yajuan Lyu

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that our method achieves significant improvement on two mainstream datasets. The ablation study further demonstrates the effectiveness of each component of our approach.
Researcher Affiliation Collaboration 1 Beijing Key Laboratory of Mobile Computing and Pervasive Device, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 2 University of Chinese Academy of Sciences, Beijing, China 3 Baidu Inc., Beijing, China
Pseudocode No The paper does not contain pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not provide any statement or link indicating the availability of open-source code for the described methodology.
Open Datasets Yes We evaluate our method on two video question answering datasets that contains character dialogues. We use standard Train/Val/Test-public splits and accuracy to measure the performance. Datasets TVQA TVQA is a widely used multi-choice video question answering dataset. Know IT VQA Know It VQA is another popular multi-choice video question answering dataset.
Dataset Splits Yes We use standard Train/Val/Test-public splits and accuracy to measure the performance.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments.
Software Dependencies No The paper mentions software components like 'Res Net-101', 'BERT', 'Adam W optimizer', and 'GPT' but does not specify their version numbers for reproducibility.
Experiment Setup Yes We set batch size as 16 and use Adam W optimizer with an initial learning rate of 0.00005. About the Inferential Knowledge Reasoner, the model parameters are initialized with the pre-trained parameters from the GPT. We set batch size as 128, and use Adam W optimizer with an initial learning rate of 0.00005, and set beam search size as 5, and set the maximum decoding step as 35.