Video-Context Aligned Transformer for Video Question Answering
Authors: Linlin Zong, Jiahui Wan, Xianchao Zhang, Xinyue Liu, Wenxin Liang, Bo Xu
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the effectiveness of V-CAT on MSVDQA and MSRVTT-QA dataset, both achieving state-of-theart performance. Extended experiments further analyze and demonstrate the effectiveness of each proposed module. |
| Researcher Affiliation | Academia | 1 Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province, School of Software, Dalian University of Technology, Dalian 116620, China 2School of Computer Science and Technology, Dalian University of Technology |
| Pseudocode | No | The paper describes the model architecture and processes in text and diagrams but does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described in this paper, nor does it explicitly state that the code is publicly available. |
| Open Datasets | Yes | We experiment on the traditional and widely used datasets in the video question answering domain, MSVD-QA(Xu et al. 2017) and MSRVTT-QA(Xu et al. 2016). |
| Dataset Splits | No | The paper mentions separate "train set" and "test set" but does not provide specific dataset split information for a validation set, nor does it detail how data partitioning was done for hyperparameter tuning. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments, only general training parameters. |
| Software Dependencies | No | The paper mentions using specific models like ResNet, ResNeXt, and BERT but does not provide specific ancillary software details, such as library names with version numbers, needed to replicate the experiment. |
| Experiment Setup | Yes | When employing the MSVD-QA dataset, the numbers of layer for each module are set at 8, 1, 1, 7 and 4, respectively, which are searched from 1 to 8. For MSRVTTQA, the numbers are 1, 1, 2, 2, and 1. Concerning the loss weight α, they are designated at 1e-5 for MSVD-QA and 1e-6 for MSRVTT-QA, which are searched from 1e-6 to 1 increasing by multiples of 10 each time. Throughout the training process, the model underwent 30 epochs of iterative training with a batch size of 128 and a learning rate of 1e-4. |