reproducibilityindex.ai

Video-Context Aligned Transformer for Video Question Answering

Authors: Linlin Zong, Jiahui Wan, Xianchao Zhang, Xinyue Liu, Wenxin Liang, Bo Xu

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the effectiveness of V-CAT on MSVDQA and MSRVTT-QA dataset, both achieving state-of-theart performance. Extended experiments further analyze and demonstrate the effectiveness of each proposed module.
Researcher Affiliation	Academia	1 Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province, School of Software, Dalian University of Technology, Dalian 116620, China 2School of Computer Science and Technology, Dalian University of Technology
Pseudocode	No	The paper describes the model architecture and processes in text and diagrams but does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described in this paper, nor does it explicitly state that the code is publicly available.
Open Datasets	Yes	We experiment on the traditional and widely used datasets in the video question answering domain, MSVD-QA(Xu et al. 2017) and MSRVTT-QA(Xu et al. 2016).
Dataset Splits	No	The paper mentions separate "train set" and "test set" but does not provide specific dataset split information for a validation set, nor does it detail how data partitioning was done for hyperparameter tuning.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments, only general training parameters.
Software Dependencies	No	The paper mentions using specific models like ResNet, ResNeXt, and BERT but does not provide specific ancillary software details, such as library names with version numbers, needed to replicate the experiment.
Experiment Setup	Yes	When employing the MSVD-QA dataset, the numbers of layer for each module are set at 8, 1, 1, 7 and 4, respectively, which are searched from 1 to 8. For MSRVTTQA, the numbers are 1, 1, 2, 2, and 1. Concerning the loss weight α, they are designated at 1e-5 for MSVD-QA and 1e-6 for MSRVTT-QA, which are searched from 1e-6 to 1 increasing by multiples of 10 each time. Throughout the training process, the model underwent 30 epochs of iterative training with a batch size of 128 and a learning rate of 1e-4.