RAGraph: A General Retrieval-Augmented Graph Learning Framework
Authors: Xinke Jiang, Rihong Qiu, Yongxin Xu, WentaoZhang , Yichen Zhu, Ruizhe Zhang, Yuchen Fang, Chu Xu, Junfeng Zhao, Yasha Wang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experimental evaluations demonstrate that RAGRAPH significantly outperforms state-of-the-art graph learning methods in multiple tasks such as node classification, link prediction, and graph classification across both dynamic and static datasets. |
| Researcher Affiliation | Academia | Key Laboratory of High Confidence Software Technologies (Peking University), School of Computer Science, Peking University, China No Affiliation, University of Electronic Science and Technology of China Center on Frontiers of Computing Studies, Peking University, Beijing, China Big Data Technology Research Center, Nanhu Laboratory, Jiaxing, China Peking University Information Technology Institute, Tianjin Binhai, China |
| Pseudocode | Yes | Algorithm 1 Toy Graph Construction... Algorithm 2 Training and Inference with Toy Graphs Retrieval |
| Open Source Code | Yes | https://github.com/Artessay/RAGraph/...Code is anonymously available at https://anonymous.4open.science/r/GLM-RAG-049D/. |
| Open Datasets | Yes | We use four static datasets PROTEINS, COX2, ENZYMES and BZR for graph classification and node classification, as well as three dynamic datasets TAOBAO, KOUBEI and AMAZON for link prediction... To evaluate the efficacy of this work, we conducted experiments that only use publicly available datasets, namely, PROTEINS, COX2, ENZYMES, BZR3, TAOBAO, KOUBEI and AMAZON in accordance to their usage terms and conditions if any. |
| Dataset Splits | No | The paper specifies a 'training-resource split' and a 'remainder of the data reserved as unseen during fine-tuning', and for static graphs a 'node partitioning with the ratio of 50%:30%', but does not explicitly detail a separate 'validation' split percentage or methodology. |
| Hardware Specification | Yes | Implementations are done using the Py Torch 2.3.0 framework [79] in Python 3.11, on an Ubuntu server equipped with 1 V100 GPU and an Intel(R) Xeon(R) CPU. |
| Software Dependencies | Yes | Implementations are done using the Py Torch 2.3.0 framework [79] in Python 3.11, on an Ubuntu server equipped with 1 V100 GPU and an Intel(R) Xeon(R) CPU. |
| Experiment Setup | Yes | In node and graph classification tasks: For baseline GCN [50], we employ a 2-layer architecture and set the hidden dimension as 256... In PRODIGY and RAGRAPH, k is set to 2, top K is set to 5, γ is set to 0.8 for PROTEINS and 0.5 for ENZYMES in node level, γ is set to 0.5 for PROTEINS, 0.6 for COX2, 0.8 for ENZYMES and 0.5 for BZR in graph level, α=λ=0.5,K =3,w1 =w2 =w3 =0.05,w4 =0.85. |