GITA: Graph to Visual and Textual Integration for Vision-Language Graph Reasoning

Authors: Yanbin Wei, Shuai Fu, Weisen Jiang, Zejian Zhang, Zhixiong Zeng, Qi Wu, James Kwok, Yu Zhang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on the GVLQA dataset and five real-world datasets show that GITA outperforms mainstream LLMs on general graph reasoning. Moreover, experimental results demonstrate the effectiveness of the layout augmentation on visual graphs and pretraining on the GVLQA dataset.
Researcher Affiliation Collaboration 1Department of Computer Science and Engineering, Southern University of Science and Technology 2Department of Computer Science and Engineering, Hong Kong University of Science and Technology 3Australia Institute for Machine Learning, University of Adelaide 4Tencent
Pseudocode No The paper describes the GITA framework's components and processes in text and flow diagrams (Figure 1), but it does not include formal pseudocode blocks or algorithms.
Open Source Code Yes Code Repository: https://github.com/WEIYanbin1999/GITA/.
Open Datasets Yes Dataset: https://huggingface.co/collections/Yanbin99/.
Dataset Splits Yes For each dataset, 80%/10%/10% of the edges are randomly used for training/validation/testing, respectively.
Hardware Specification Yes All fine-tuning experiments are conducted on an NVIDIA DGX station with 8 A100 GPUs.
Software Dependencies No The paper mentions tools like Graphviz, Matplotlib, and NetworkX used for graph visualization and frameworks like LoRA, but it does not provide specific version numbers for any of these software dependencies.
Experiment Setup Yes For all fine-tuning experiments, we use a batch size of 128 and adopt the Adam W optimizer (with a learning rate of 0.0002 and 0.00002 for the Lo RA adapters within the text decoder and vision-to-text projector, respectively).