KG-FIT: Knowledge Graph Fine-Tuning Upon Open-World Knowledge
Authors: Pengcheng Jiang, Lang Cao, Cao (Danica) Xiao, Parminder Bhatia, Jimeng Sun, Jiawei Han
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on the benchmark datasets FB15K-237, YAGO3-10, and Prime KG demonstrate the superiority of KG-FIT over state-of-the-art pre-trained language model-based methods, achieving improvements of 14.4%, 13.5%, and 11.9% in the Hits@10 metric for the link prediction task, respectively. Furthermore, KG-FIT yields substantial performance gains of 12.6%, 6.7%, and 17.7% compared to the structure-based base models upon which it is built. |
| Researcher Affiliation | Collaboration | University of Illinois at Urbana-Champaign GE Health Care |
| Pseudocode | Yes | Algorithm 1 Seed Hierarchy Construction; Algorithm 2 LLM-Guided Cluster Splitting; Algorithm 3 LLM-Guided Bottom-Up Hierarchy Refinement |
| Open Source Code | Yes | Our code and data are available at https://github.com/pat-jj/KG-FIT. |
| Open Datasets | Yes | FB15k-237 [40] (CC BY 4.0) is a subset of Freebase [41], a large collaborative knowledge base, focusing on common knowledge; (2) YAGO3-10 [42] is a subset of YAGO [43] (CC BY 4.0), which is a large knowledge base derived from multiple sources including Wikipedia, Word Net, and Geo Names; (3) Prime KG [44] (CC0 1.0) is a biomedical KG that integrates 20 biomedical resources |
| Dataset Splits | Yes | Table 2: Datasets statistics. #Ent./#Rel: number of entities/relations. #Train/#Valid/#Test: number of triples contained in the training/validation/testing set. |
| Hardware Specification | Yes | For FB15K-237, Prime KG, and WN18RR, experiments are conducted on a machine equipped with two AMD EPYC 7513 32-Core Processors, 528GB RAM, eight NVIDIA RTX A6000 GPUs, and CUDA 12.4 and the NVIDIA driver version 550.76. For YAGO3-10, due to its large size, experiments are conducted on a machine equipped with two AMD EPYC 7513 32-Core Processors, 528GB RAM, and eight NVIDIA A100 80GB PCIe GPUs. |
| Software Dependencies | Yes | For FB15K-237, Prime KG, and WN18RR, experiments are conducted on a machine equipped with two AMD EPYC 7513 32-Core Processors, 528GB RAM, eight NVIDIA RTX A6000 GPUs, and CUDA 12.4 and the NVIDIA driver version 550.76. For YAGO3-10... The system uses CUDA 12.2 and the NVIDIA driver version 535.129.03. |
| Experiment Setup | Yes | Table 11: Summary of hyperparameters we explored for both base models and KG-FIT. Table 12: Best hyperparameters grid-searched for base models on different datasets. Table 13: Hyperparameters we used for KG-FIT with different base models on different datasets. |