Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Heterogeneous Graph Learning for Visual Commonsense Reasoning
Authors: Weijiang Yu, Jingwen Zhou, Weihao Yu, Xiaodan Liang, Nong Xiao
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on the large-scale Visual Commonsense Reasoning benchmark demonstrate the superior performance of our proposed modules on three tasks (improving 5% accuracy on Q A, 3.5% on QA R, 5.8% on Q AR)2. |
| Researcher Affiliation | Academia | 1School of Data and Computer Science, Sun Yat-sen University 2School of Intelligent Systems Engineering, Sun Yat-sen University |
| Pseudocode | No | The paper does not contain any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is released in https://github.com/yuweijiang/HGL-pytorch |
| Open Datasets | Yes | We carry out extensive experiments on VCR [44] benchmark, a representative large-scale visual commonsense reasoning dataset |
| Dataset Splits | Yes | The dataset is of๏ฌcially split into a training set consisting of 80,418 images with 212,923 questions, a validation set containing 9,929 images with 26,534 questions and a test set made up of 9,557 with 25,263 queries. We follow this data partition in all experiments. |
| Hardware Specification | Yes | We conduct all experiments using 8 Ge Force GTX TITAN XP cards on a single server. |
| Software Dependencies | No | The paper mentions using PyTorch, ResNet-50, and BERT but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | The batch size is set to 96 with 12 images on each GPU. The hyper-parameters in training mostly follow R2C [44]. We train our model by utilizing multi-class cross entropy between the prediction and label. For all training, Adam [23] with weight decay of 0.0001 and beta of 0.9 is adopted to optimize all models. The initial learning rate is 0.0002, reducing half ( 0.5) for two epochs when the validation accuracy is not increasing. We train 20 epochs for all models from scratch in an end-to-end manner. |