Reasoning with Heterogeneous Graph Alignment for Video Question Answering

Authors: Pin Jiang, Yahong Han11109-11116

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method on three benchmark datasets and conduct extensive ablation study to the effectiveness of the network architecture. Experiments show the network to be superior in quality.
Researcher Affiliation Academia Pin Jiang, Yahong Han College of Intelligence and Computing Tianjin University, Tianjin, China {jpin, yahong}@tju.edu.cn
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any concrete access information (e.g., specific repository link, explicit code release statement) for source code.
Open Datasets Yes TGIF-QA is a widely used large-scale benchmark dataset for Video QA (Jang et al. 2017)... MSVD-QA and MSRVTT-QA are two datasets generated from video descriptions through an automatic method (Xu et al. 2017).
Dataset Splits Yes These datasets have provided a standard partition of the training, validation and testing sets.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions using pre-trained models like ResNet-152, C3D, VGG, and GloVe word embeddings, but it does not specify software components with version numbers (e.g., programming language versions, deep learning framework versions, or library versions).
Experiment Setup Yes In terms of training details, we set the number of the hidden units d to 512. Batch size is set to 64. We use Adam as an optimizer with initial learning rate 10 4. The dropout rate is set to 0.3. For better performance, we use some general training strategies, including early stop, learning rate warming up, and learning rate cosine annealing.