reproducibilityindex.ai

SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning

Authors: Zhecan Wang, Haoxuan You, Liunian Harold Li, Alireza Zareian, Suji Park, Yiqing Liang, Kai-Wei Chang, Shih-Fu Chang5914-5922

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on VCR and other tasks show significant performance boost compared with the state-of-the-art methods, and prove the efficacy of each proposed component. 4 Experiments In this section, we analyze different components of our framework and compare the performance with the SOTA methods. 4.1 Ablation Study We show the effectiveness of the proposed methods on the validation set of VCR. In Tab. 1, we show the experimental results of proposed three components: multihop graph Transformer Hop Trans, scene-graph-aware pretraining Pretrain-V and semantically-relevant scene graphs generated by Text-VSPNet trained by proposed strategy, Scene Graph+.
Researcher Affiliation	Academia	1 Columbia University 2 University of California, Los Angeles
Pseudocode	No	The paper includes mathematical equations (1-6) but no explicitly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets	Yes	Extensive experiments on VCR and other tasks show significant performance boost compared with the state-of-the-art methods... Visual Commonsense Reasoning (Zellers et al. 2019)... VSPNet was originally trained on Visual Genome (VG) (Krishna et al. 2017)... experiments on GQA and SNLIVE dataset in Tab. 3. It is important to note that we focus on validating the generalized advantage of our method across different datasets... The domain of GQA is very close to Visual Genome where (Zellers et al. 2018) is trained.
Dataset Splits	Yes	We show the effectiveness of the proposed methods on the validation set of VCR.
Hardware Specification	No	The paper mentions replacing an object detector with a stronger one ('Anderson et al. 2018') but does not specify any hardware details like GPU models, CPU types, or memory used for running the experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers, such as Python versions, specific libraries, or frameworks.
Experiment Setup	No	The paper describes the model architecture and training strategies but does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or other detailed experimental setup configurations in the main text.