GITA: Graph to Visual and Textual Integration for Vision-Language Graph Reasoning
Authors: Yanbin Wei, Shuai Fu, Weisen Jiang, Zejian Zhang, Zhixiong Zeng, Qi Wu, James Kwok, Yu Zhang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on the GVLQA dataset and five real-world datasets show that GITA outperforms mainstream LLMs on general graph reasoning. Moreover, experimental results demonstrate the effectiveness of the layout augmentation on visual graphs and pretraining on the GVLQA dataset. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science and Engineering, Southern University of Science and Technology 2Department of Computer Science and Engineering, Hong Kong University of Science and Technology 3Australia Institute for Machine Learning, University of Adelaide 4Tencent |
| Pseudocode | No | The paper describes the GITA framework's components and processes in text and flow diagrams (Figure 1), but it does not include formal pseudocode blocks or algorithms. |
| Open Source Code | Yes | Code Repository: https://github.com/WEIYanbin1999/GITA/. |
| Open Datasets | Yes | Dataset: https://huggingface.co/collections/Yanbin99/. |
| Dataset Splits | Yes | For each dataset, 80%/10%/10% of the edges are randomly used for training/validation/testing, respectively. |
| Hardware Specification | Yes | All fine-tuning experiments are conducted on an NVIDIA DGX station with 8 A100 GPUs. |
| Software Dependencies | No | The paper mentions tools like Graphviz, Matplotlib, and NetworkX used for graph visualization and frameworks like LoRA, but it does not provide specific version numbers for any of these software dependencies. |
| Experiment Setup | Yes | For all fine-tuning experiments, we use a batch size of 128 and adopt the Adam W optimizer (with a learning rate of 0.0002 and 0.00002 for the Lo RA adapters within the text decoder and vision-to-text projector, respectively). |