Heterogeneous Graph Learning for Visual Commonsense Reasoning
Authors: Weijiang Yu, Jingwen Zhou, Weihao Yu, Xiaodan Liang, Nong Xiao
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on the large-scale Visual Commonsense Reasoning benchmark demonstrate the superior performance of our proposed modules on three tasks (improving 5% accuracy on Q A, 3.5% on QA R, 5.8% on Q AR)2. |
| Researcher Affiliation | Academia | 1School of Data and Computer Science, Sun Yat-sen University 2School of Intelligent Systems Engineering, Sun Yat-sen University |
| Pseudocode | No | The paper does not contain any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is released in https://github.com/yuweijiang/HGL-pytorch |
| Open Datasets | Yes | We carry out extensive experiments on VCR [44] benchmark, a representative large-scale visual commonsense reasoning dataset |
| Dataset Splits | Yes | The dataset is officially split into a training set consisting of 80,418 images with 212,923 questions, a validation set containing 9,929 images with 26,534 questions and a test set made up of 9,557 with 25,263 queries. We follow this data partition in all experiments. |
| Hardware Specification | Yes | We conduct all experiments using 8 Ge Force GTX TITAN XP cards on a single server. |
| Software Dependencies | No | The paper mentions using PyTorch, ResNet-50, and BERT but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | The batch size is set to 96 with 12 images on each GPU. The hyper-parameters in training mostly follow R2C [44]. We train our model by utilizing multi-class cross entropy between the prediction and label. For all training, Adam [23] with weight decay of 0.0001 and beta of 0.9 is adopted to optimize all models. The initial learning rate is 0.0002, reducing half ( 0.5) for two epochs when the validation accuracy is not increasing. We train 20 epochs for all models from scratch in an end-to-end manner. |