Multilevel Hierarchical Network with Multiscale Sampling for Video Question Answering

Authors: Min Peng, Chongyang Wang, Yuan Gao, Yu Shi, Xiang-Dong Zhou

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experiments on three Video QA datasets, we demonstrate improved performances than previous state-of-the-arts and justify the effectiveness of each part of our method.
Researcher Affiliation Academia Min Peng1,2 , Chongyang Wang3 , Yuan Gao4 , Yu Shi2 and Xiang-Dong Zhou2 1University of Chinese Academy of Sciences 2Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences 3University College London 4Shenzhen Institute of Artificial Intelligence and Robotics for Society
Pseudocode No No structured pseudocode or algorithm blocks were found.
Open Source Code No The paper does not provide an explicit statement or link for the open-sourcing of the code for the described methodology.
Open Datasets Yes Three Video QA benchmarks are adopted for our evaluation. TGIF-QA [Jang et al., 2017] is a large-scale dataset for video QA... MSVD-QA [Xu et al., 2017] comprises 1,970 short clips... MSRVTT-QA [Xu et al., 2017] comprises 10K videos...
Dataset Splits Yes We use the official split of training, validation, and testing sets of each dataset.
Hardware Specification Yes We implement the method with Py Torch deep learning library on a PC with two GTX 1080 Ti GPUs.
Software Dependencies No The paper mentions 'Py Torch deep learning library' but does not specify a version number for it or any other software dependency.
Experiment Setup Yes By default, the maximum scale N is set to 3... For each multimodal interaction block in RMI module, the feature dimension d is set to 512, and the number of attentional heads H is set to 8. The number of mini batch size is set to 32, with a maximum number of epochs set to 20. The Adam [Kingma and Ba, 2015] optimizer is used, with the initial learning rate set to 1e-4, which reduces by half when the loss stops decreasing after every 10 epochs.