RAGraph: A General Retrieval-Augmented Graph Learning Framework

Authors: Xinke Jiang, Rihong Qiu, Yongxin Xu, WentaoZhang , Yichen Zhu, Ruizhe Zhang, Yuchen Fang, Chu Xu, Junfeng Zhao, Yasha Wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive experimental evaluations demonstrate that RAGRAPH significantly outperforms state-of-the-art graph learning methods in multiple tasks such as node classification, link prediction, and graph classification across both dynamic and static datasets.
Researcher Affiliation Academia Key Laboratory of High Confidence Software Technologies (Peking University), School of Computer Science, Peking University, China No Affiliation, University of Electronic Science and Technology of China Center on Frontiers of Computing Studies, Peking University, Beijing, China Big Data Technology Research Center, Nanhu Laboratory, Jiaxing, China Peking University Information Technology Institute, Tianjin Binhai, China
Pseudocode Yes Algorithm 1 Toy Graph Construction... Algorithm 2 Training and Inference with Toy Graphs Retrieval
Open Source Code Yes https://github.com/Artessay/RAGraph/...Code is anonymously available at https://anonymous.4open.science/r/GLM-RAG-049D/.
Open Datasets Yes We use four static datasets PROTEINS, COX2, ENZYMES and BZR for graph classification and node classification, as well as three dynamic datasets TAOBAO, KOUBEI and AMAZON for link prediction... To evaluate the efficacy of this work, we conducted experiments that only use publicly available datasets, namely, PROTEINS, COX2, ENZYMES, BZR3, TAOBAO, KOUBEI and AMAZON in accordance to their usage terms and conditions if any.
Dataset Splits No The paper specifies a 'training-resource split' and a 'remainder of the data reserved as unseen during fine-tuning', and for static graphs a 'node partitioning with the ratio of 50%:30%', but does not explicitly detail a separate 'validation' split percentage or methodology.
Hardware Specification Yes Implementations are done using the Py Torch 2.3.0 framework [79] in Python 3.11, on an Ubuntu server equipped with 1 V100 GPU and an Intel(R) Xeon(R) CPU.
Software Dependencies Yes Implementations are done using the Py Torch 2.3.0 framework [79] in Python 3.11, on an Ubuntu server equipped with 1 V100 GPU and an Intel(R) Xeon(R) CPU.
Experiment Setup Yes In node and graph classification tasks: For baseline GCN [50], we employ a 2-layer architecture and set the hidden dimension as 256... In PRODIGY and RAGRAPH, k is set to 2, top K is set to 5, γ is set to 0.8 for PROTEINS and 0.5 for ENZYMES in node level, γ is set to 0.5 for PROTEINS, 0.6 for COX2, 0.8 for ENZYMES and 0.5 for BZR in graph level, α=λ=0.5,K =3,w1 =w2 =w3 =0.05,w4 =0.85.