Heterogeneous Graph Learning for Visual Commonsense Reasoning

Authors: Weijiang Yu, Jingwen Zhou, Weihao Yu, Xiaodan Liang, Nong Xiao

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on the large-scale Visual Commonsense Reasoning benchmark demonstrate the superior performance of our proposed modules on three tasks (improving 5% accuracy on Q A, 3.5% on QA R, 5.8% on Q AR)2.
Researcher Affiliation Academia 1School of Data and Computer Science, Sun Yat-sen University 2School of Intelligent Systems Engineering, Sun Yat-sen University
Pseudocode No The paper does not contain any explicit pseudocode or algorithm blocks.
Open Source Code Yes Our code is released in https://github.com/yuweijiang/HGL-pytorch
Open Datasets Yes We carry out extensive experiments on VCR [44] benchmark, a representative large-scale visual commonsense reasoning dataset
Dataset Splits Yes The dataset is officially split into a training set consisting of 80,418 images with 212,923 questions, a validation set containing 9,929 images with 26,534 questions and a test set made up of 9,557 with 25,263 queries. We follow this data partition in all experiments.
Hardware Specification Yes We conduct all experiments using 8 Ge Force GTX TITAN XP cards on a single server.
Software Dependencies No The paper mentions using PyTorch, ResNet-50, and BERT but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes The batch size is set to 96 with 12 images on each GPU. The hyper-parameters in training mostly follow R2C [44]. We train our model by utilizing multi-class cross entropy between the prediction and label. For all training, Adam [23] with weight decay of 0.0001 and beta of 0.9 is adopted to optimize all models. The initial learning rate is 0.0002, reducing half ( 0.5) for two epochs when the validation accuracy is not increasing. We train 20 epochs for all models from scratch in an end-to-end manner.