GraphVis: Boosting LLMs with Visual Knowledge Graph Integration
Authors: Yihe Deng, Chenchen Ye, Zijie Huang, Mingyu Derek Ma, Yiwen Kou, Wei Wang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present comprehensive evaluations across commonsense reasoning QA benchmarks, where Graph Vis provides an average improvement of 11.1% over its base model and outperforms existing KG-enhanced LLM approaches. Across VQA benchmarks such as Science QA that share similar scientific diagram images, Graph Vis provides a notable gain of 4.32%. We present experiment results of Graph Vis on enhancing commonsense reasoning tasks... |
| Researcher Affiliation | Academia | Yihe Deng Chenchen Ye Zijie Huang Mingyu Derek Ma Yiwen Kou Wei Wang University of California, Los Angeles |
| Pseudocode | Yes | Algorithm 1 Graph Vis |
| Open Source Code | Yes | Code is made available on Git Hub. Codes and scripts are provided in the supplemental materal. |
| Open Datasets | Yes | We consider Concept Net (Speer et al., 2017), a commonsense knowledge graph, as the KG used in our experiments. We consider Commonsense QA (CSQA) (Talmor et al., 2019) and Open Book QA (OBQA) (Mihaylov et al., 2018) as the commonsense reasoning tasks... For the zero-shot VQA tasks, we consider Science QA (Lu et al., 2022), MMBench (Liu et al., 2023c) and POPE (Li et al., 2023b)... |
| Dataset Splits | No | The paper mentions using 'training data' and 'fine-tuning' but does not explicitly provide details about training/validation/test splits for its own experimental process, beyond mentioning test sets for evaluation benchmarks. |
| Hardware Specification | Yes | Experiments of this paper were all conducted on NVIDIA RTX A6000 GPU clusters. |
| Software Dependencies | No | The paper mentions 'llava-v1.6-mistral-7b' as the base model and 'Graphviz tool' but does not specify version numbers for these or other software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | We present the fine-tuning hyperparameters of Graph Vis in Table 5. Table 5 includes: lora_r 128, lora_alpha 256, lora_target all, Learning rate 1e-7, Optimizer Adam W, Global batch size 4, gradient_accumulation_steps 1, weight_decay 0, warmup_ratio 0.03, lr_scheduler_type cosine, image_aspect_ratio pad, group_by_modality_length True, model_max_length 2048, mm_projector_lr 2e-5, mm_projector_type mlp2x_gelu. |