Inferential Knowledge-Enhanced Integrated Reasoning for Video Question Answering
Authors: Jianguo Mao, Wenbin Jiang, Hong Liu, Xiangdong Wang, Yajuan Lyu
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that our method achieves significant improvement on two mainstream datasets. The ablation study further demonstrates the effectiveness of each component of our approach. |
| Researcher Affiliation | Collaboration | 1 Beijing Key Laboratory of Mobile Computing and Pervasive Device, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 2 University of Chinese Academy of Sciences, Beijing, China 3 Baidu Inc., Beijing, China |
| Pseudocode | No | The paper does not contain pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement or link indicating the availability of open-source code for the described methodology. |
| Open Datasets | Yes | We evaluate our method on two video question answering datasets that contains character dialogues. We use standard Train/Val/Test-public splits and accuracy to measure the performance. Datasets TVQA TVQA is a widely used multi-choice video question answering dataset. Know IT VQA Know It VQA is another popular multi-choice video question answering dataset. |
| Dataset Splits | Yes | We use standard Train/Val/Test-public splits and accuracy to measure the performance. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments. |
| Software Dependencies | No | The paper mentions software components like 'Res Net-101', 'BERT', 'Adam W optimizer', and 'GPT' but does not specify their version numbers for reproducibility. |
| Experiment Setup | Yes | We set batch size as 16 and use Adam W optimizer with an initial learning rate of 0.00005. About the Inferential Knowledge Reasoner, the model parameters are initialized with the pre-trained parameters from the GPT. We set batch size as 128, and use Adam W optimizer with an initial learning rate of 0.00005, and set beam search size as 5, and set the maximum decoding step as 35. |