Rethinking Knowledge Graph Evaluation Under the Open-World Assumption

Authors: Haotong Yang, Zhouchen Lin, Muhan Zhang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we study KGC evaluation under a more realistic setting, namely the open-world assumption, where unknown triplets are considered to include many missing facts not included in the training or test sets. We validate the phenomenon both theoretically and experimentally.
Researcher Affiliation Academia 1Key Lab of Machine Perception (Mo E), School of Intelligence Science and Technology, Peking University 2Institute for Artificial Intelligence, Peking University 3Peng Cheng Laboratory 4Beijing Institute for General Artificial Intelligence
Pseudocode No The paper does not include pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes Our code and data are available at https://github.com/Graph PKU/Open-World-KG.
Open Datasets Yes Our code and data are available at https://github.com/Graph PKU/Open-World-KG.
Dataset Splits Yes In the rest of the paper, we denote the closed-world KG as Gfull. Because we want to study the evaluation, we denote the existing open-world dataset as Gtest, and extract the training set Gtrain from Gtest. Here, Gtrain Gtest Gfull and the facts in Gtrain, Gtest \ Gtrain, Gfull \ Gtest are training facts, test facts and missing facts respectively.
Hardware Specification Yes The experiments were run on two clusters with four NVIDIA A40 and six NVIDIA Ge Force 3090 GPUs respectively.
Software Dependencies No The paper mentions training KGC models and adjusting hyperparameters but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup No We train four KGC models with different hyperparameter settings (which results in 18 different models in total) and test them on full test set Gfull \ Gtrain and sparse test set Gtest \ Gtrain respectively. For each model, we adjust hyperparameters according to their official implementations and recommendations to obtain models with different performances. Due to space limit, we don’t list the concrete hyperparameters here, which can be found in our released code.