reproducibilityindex.ai

(2.5+1)D Spatio-Temporal Scene Graphs for Video Question Answering

Authors: Anoop Cherian, Chiori Hori, Tim K. Marks, Jonathan Le Roux444-453

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To demonstrate the effectiveness of our approach, we present experiments on the NEx T-QA and AVSD-QA datasets. Our results show that our proposed (2.5+1)D representation leads to faster training and inference, while our hierarchical model showcases superior performance on the video QA task versus the state of the art.In this section, we provide experiments demonstrating the empirical beneﬁts of our proposed representation and inference pipeline.
Researcher Affiliation	Industry	Anoop Cherian, Chiori Hori, Tim K. Marks, Jonathan Le Roux Mitsubishi Electric Research Labs (MERL), Cambridge, MA {cherian, chori, tmarks, leroux}@merl.com
Pseudocode	Yes	Algorithm 1: Identifying common ancestors for merging
Open Source Code	No	The paper mentions using code provided by other authors (e.g., 'We use the code provided by the authors of (Xiao et al. 2021)', 'we used an implementation that is shared by the authors of (Geng et al. 2021)'), but does not explicitly state that their own source code for the described methodology is publicly available or provide a link to it.
Open Datasets	Yes	We used two recent video QA datasets for evaluating our task, namely NEx T-QA (Xiao et al. 2021) and AVSD-QA (Alamri et al. 2019a).
Dataset Splits	Yes	NEx T-QA Dataset ... consists of 3,870 training, 570 validation, and 1,000 test videos. The dataset provides 34,132, 4,996, and 8,564 multiple choice questions in the training, validation, and test sets respectively... AVSD-QA ... to use 7,985, 1,863, and 1,968 clips for training, validation, and test.
Hardware Specification	Yes	our experiments show that the time taken for every training iteration in this case slows down 4-fold (from 1.5 s per iteration to 6 s on a single RTX6000 GPU).
Software Dependencies	No	The paper mentions specific models and frameworks (e.g., 'Faster RCNN', 'Mi DAS model', 'I3D action recognition neural network', 'BERT features') but does not provide version numbers for these or any other ancillary software components (e.g., Python, PyTorch versions).
Experiment Setup	Yes	For NEx T-QA, we used a learning rate of 5e-5 as suggested in the paper with a batch size of 64 and trained for 50 epochs, while AVSD-QA used a learning rate of 1e-3 and a batch size of 100, and trained for 20 epochs. ... For the Transformer, we used a 4-headed attention for NEx T-QA, and a 2-headed attention for AVSD-QA.