KnowIT VQA: Answering Knowledge-Based Questions about Videos

Authors: Noa Garcia, Mayu Otani, Chenhui Chu, Yuta Nakashima10826-10834

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our main findings are: (i) the incorporation of knowledge produces outstanding improvements for VQA in video, and (ii) the performance on Know IT VQA still lags well behind human accuracy, indicating its usefulness for studying current video modelling limitations. ... Experimental Results We evaluated and compared ROCK against several baselines on the Know IT VQA dataset. Results per question type and overall accuracy are reported in Table 4.
Researcher Affiliation Collaboration Noa Garcia,1 Mayu Otani,2 Chenhui Chu,1 Yuta Nakashima1 1Osaka University, Japan, 2Cyber Agent, Inc., Japan
Pseudocode No The paper describes the ROCK model and its components using text and a diagram (Figure 6), but it does not include any formal pseudocode or algorithm blocks.
Open Source Code No The paper provides a link to the Know IT VQA dataset (https://knowit-vqa.github.io/), but it does not explicitly state that the source code for the methodology described in the paper is openly available or provide a link to a code repository for it.
Open Datasets Yes We introduce Know IT VQA, a dataset for KBVQA in videos... Available at https://knowit-vqa.github.io/
Dataset Splits Yes We randomly split the episodes into training, validation, and test sets, so that questions and clips from the same episode were assigned to the same set. The number of episodes, clips, and QA pairs in each split are detailed in Table 2... Table 2: Know IT VQA data splits and the average lengths. Train Val Test Total # Episodes 167 20 20 207 # Scenes 2,007 225 240 2,472 # Clips 9,731 1,178 1,178 12,087 # QAs 19,569 2,352 2,361 24,282
Hardware Specification No The paper mentions that "Models were trained with stochastic gradient descent" and refers to "BERT implementations" but does not provide any specific details about the hardware used, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions software components like "BERT network" and "Resnet50" by citing their respective papers, but it does not specify exact version numbers for any software or libraries (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes Models were trained with stochastic gradient descent with momentum of 0.9 and learning rate of 0.001. In BERT implementations, we used the uncased base model with pre-trained initialisation.