reproducibilityindex.ai

GITA: Graph to Visual and Textual Integration for Vision-Language Graph Reasoning

Authors: Yanbin Wei, Shuai Fu, Weisen Jiang, Zejian Zhang, Zhixiong Zeng, Qi Wu, James Kwok, Yu Zhang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on the GVLQA dataset and five real-world datasets show that GITA outperforms mainstream LLMs on general graph reasoning. Moreover, experimental results demonstrate the effectiveness of the layout augmentation on visual graphs and pretraining on the GVLQA dataset.
Researcher Affiliation	Collaboration	1Department of Computer Science and Engineering, Southern University of Science and Technology 2Department of Computer Science and Engineering, Hong Kong University of Science and Technology 3Australia Institute for Machine Learning, University of Adelaide 4Tencent
Pseudocode	No	The paper describes the GITA framework's components and processes in text and flow diagrams (Figure 1), but it does not include formal pseudocode blocks or algorithms.
Open Source Code	Yes	Code Repository: https://github.com/WEIYanbin1999/GITA/.
Open Datasets	Yes	Dataset: https://huggingface.co/collections/Yanbin99/.
Dataset Splits	Yes	For each dataset, 80%/10%/10% of the edges are randomly used for training/validation/testing, respectively.
Hardware Specification	Yes	All fine-tuning experiments are conducted on an NVIDIA DGX station with 8 A100 GPUs.
Software Dependencies	No	The paper mentions tools like Graphviz, Matplotlib, and NetworkX used for graph visualization and frameworks like LoRA, but it does not provide specific version numbers for any of these software dependencies.
Experiment Setup	Yes	For all fine-tuning experiments, we use a batch size of 128 and adopt the Adam W optimizer (with a learning rate of 0.0002 and 0.00002 for the Lo RA adapters within the text decoder and vision-to-text projector, respectively).