Entity Alignment with Reliable Path Reasoning and Relation-aware Heterogeneous Graph Transformer

Authors: Weishan Cai, Wenjun Ma, Jieyu Zhan, Yuncheng Jiang

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we evaluate the performance of RPR-RHGT on three widely used benchmark datasets. The code is now available at https://github.com/cwswork/RPR-RHGT. [...] Extensive experiments on three well-known datasets show RPR-RHGT significantly outperforms 10 state-of-the-art methods, exceeding the best performing baseline up to 8.62% on Hits@1.
Researcher Affiliation Academia 1School of Computer Science, South China Normal University, China 2School of Computer and Information Engineering, Hanshan Normal University, China 3School of Artificial Intelligence, South China Normal University, China caiws@m.scnu.edu.cn, phoenixsam@sina.com, zhanjieyu,ycjiang@scnu.edu.cn
Pseudocode Yes Algorithm 1 Procedure of RPR Algorithm.
Open Source Code Yes The code is now available at https://github.com/cwswork/RPR-RHGT.
Open Datasets Yes Three experimental datasets contain crosslingual datasets and mono-lingual dataset: WK31-15K [Sun et al., 2020b] is from multi-lingual DBpedia and used to evaluate model performance on sparse and dense datasets, where each subset contains two versions: V1 is sparse set obtained by using IDS algorithm, and V2 is twice as dense as V1. DBP-15K [Sun et al., 2017] is the most used dataset in the literature, and is also from DBpedia. DWY-100K [Sun et al., 2018] contains two mono-lingual KGs, which serve as large-scale datasets to better evaluate the scalability of experimental models.
Dataset Splits Yes For WK31-15K and DBP-15K, the proportion of train, validation and test is 2:1:7, the same as [Sun et al., 2020b]. For DWY-100K, we adopt the same train (30%) / test (70%) split as baselines.
Hardware Specification Yes The results running on a workstation with CPU (EPYC 3975WX +256G RAM) and GPU (RTX A4000 with 16G) are shown in Table 4, which shows large differences between different methods.
Software Dependencies No We use fast Text 1 to generate entity name embeddings that are uniformly applied to baseline recurrence, including RDGCN, NMN, RAGA, Multi KE and COTSAE. 1https://fasttext.cc/docs/en/crawl-vectors.html
Experiment Setup Yes For all datasets, we use the same weight hyper-parameters: τ sim = 0.5, τ path = 20, hn =4, γ1 =γ2 =10, θ = 0.3. The embedding dimensions of 15K and 100K datasets are 300 and 200, respectively.