GraphVis: Boosting LLMs with Visual Knowledge Graph Integration

Authors: Yihe Deng, Chenchen Ye, Zijie Huang, Mingyu Derek Ma, Yiwen Kou, Wei Wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present comprehensive evaluations across commonsense reasoning QA benchmarks, where Graph Vis provides an average improvement of 11.1% over its base model and outperforms existing KG-enhanced LLM approaches. Across VQA benchmarks such as Science QA that share similar scientific diagram images, Graph Vis provides a notable gain of 4.32%. We present experiment results of Graph Vis on enhancing commonsense reasoning tasks...
Researcher Affiliation Academia Yihe Deng Chenchen Ye Zijie Huang Mingyu Derek Ma Yiwen Kou Wei Wang University of California, Los Angeles
Pseudocode Yes Algorithm 1 Graph Vis
Open Source Code Yes Code is made available on Git Hub. Codes and scripts are provided in the supplemental materal.
Open Datasets Yes We consider Concept Net (Speer et al., 2017), a commonsense knowledge graph, as the KG used in our experiments. We consider Commonsense QA (CSQA) (Talmor et al., 2019) and Open Book QA (OBQA) (Mihaylov et al., 2018) as the commonsense reasoning tasks... For the zero-shot VQA tasks, we consider Science QA (Lu et al., 2022), MMBench (Liu et al., 2023c) and POPE (Li et al., 2023b)...
Dataset Splits No The paper mentions using 'training data' and 'fine-tuning' but does not explicitly provide details about training/validation/test splits for its own experimental process, beyond mentioning test sets for evaluation benchmarks.
Hardware Specification Yes Experiments of this paper were all conducted on NVIDIA RTX A6000 GPU clusters.
Software Dependencies No The paper mentions 'llava-v1.6-mistral-7b' as the base model and 'Graphviz tool' but does not specify version numbers for these or other software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes We present the fine-tuning hyperparameters of Graph Vis in Table 5. Table 5 includes: lora_r 128, lora_alpha 256, lora_target all, Learning rate 1e-7, Optimizer Adam W, Global batch size 4, gradient_accumulation_steps 1, weight_decay 0, warmup_ratio 0.03, lr_scheduler_type cosine, image_aspect_ratio pad, group_by_modality_length True, model_max_length 2048, mm_projector_lr 2e-5, mm_projector_type mlp2x_gelu.