reproducibilityindex.ai

Graph Neural Prompting with Large Language Models

Authors: Yijun Tian, Huan Song, Zichen Wang, Haozhu Wang, Ziqing Hu, Fang Wang, Nitesh V. Chawla, Panpan Xu

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on multiple datasets demonstrate the superiority of GNP on both commonsense and biomedical reasoning tasks across different LLM sizes and settings.
Researcher Affiliation	Collaboration	Yijun Tian1, Huan Song2, Zichen Wang2, Haozhu Wang2, Ziqing Hu2, Fang Wang2, Nitesh V. Chawla1, Panpan Xu2 1University of Notre Dame 2Amazon
Pseudocode	No	The paper describes its method in text and mathematical equations but does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/meettyj/GNP.
Open Datasets	Yes	For the used knowledge graphs, we consider Concept Net (Speer, Chin, and Havasi 2017) that contains rich commonsense knowledge regarding the daily concepts, and Unified Medical Language System (UMLS) (Bodenreider 2004) that involves well-structured health and biomedical information. For datasets, we use four commonsense reasoning datasets, including Open Book QA (OBQA) (Mihaylov et al. 2018), AI2 Reasoning Challenge (ARC) (Clark et al. 2018), Physical Interaction Question Answering (PIQA) (Bisk et al. 2020), and Riddle Sense (Riddle) (Lin et al. 2021). In addition, we consider Pub Med QA (PQA) (Jin et al. 2019) and Bio ASQ (Tsatsaronis et al. 2015) for biomedical reasoning.
Dataset Splits	Yes	Implementation Details. For the proposed model, we set the learning rate to 1e-4, batch size to 8, hidden dimension of GNN to 1024, and training epochs to 50. In order to adapt the model effectively to each dataset, we search the GNN layers from 2 to 5, cross-modality pooling layers from 1 to 3, trade-off weight λ from {0.1, 0.5}, and link drop rate from {0.1, 0.3, 0.7}.
Hardware Specification	Yes	We run all experiments on four NVIDIA Tesla V100 GPUs with 24GB RAM.
Software Dependencies	No	The paper mentions using FLAN-T5 LLMs and specifies hyper-parameters, but does not provide specific version numbers for software dependencies like Python, PyTorch, or CUDA.
Experiment Setup	Yes	For the proposed model, we set the learning rate to 1e-4, batch size to 8, hidden dimension of GNN to 1024, and training epochs to 50. In order to adapt the model effectively to each dataset, we search the GNN layers from 2 to 5, cross-modality pooling layers from 1 to 3, trade-off weight λ from {0.1, 0.5}, and link drop rate from {0.1, 0.3, 0.7}.