BERT-INT:A BERT-based Interaction Model For Knowledge Graph Alignment

Authors: Xiaobin Tang, Jing Zhang, Bo Chen, Yang Yang, Hong Chen, Cuiping Li

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that our model significantly outperforms the best state-of-the-art methods by 1.9-9.7% in terms of Hit Ratio@1 on the dataset DBP15K.
Researcher Affiliation Academia Xiaobin Tang1,2 , Jing Zhang1,2 , Bo Chen1,2 , Yang Yang3 , Hong Chen1,2 and Cuiping Li1,2 1Key Laboratory of Data Engineering and Knowledge Engineering of Ministry of Education, Renmin University of China 2Information School, Renmin University of China 3Zhejiang University {txb, zhang-jing, bochen, chong, licuiping}@ruc.edu.cn, yangya@zju.edu.cn
Pseudocode No The paper describes the model architecture and steps in text and diagrams (Figure 2, Figure 3) but does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes Codes and datasets are online now3. 3https://github.com/kosugi11037/bert-int
Open Datasets Yes We evaluate our model on the widely used cross-lingual dataset DBP15K and the mono-lingual dataset DWY100K and use Hit Ratio@K (K=1,10) and MRR to evaluate (Cf. [Sun et al., 2018] for details).
Dataset Splits No We evaluate our model on the widely used cross-lingual dataset DBP15K and the mono-lingual dataset DWY100K and use Hit Ratio@K (K=1,10) and MRR to evaluate (Cf. [Sun et al., 2018] for details). The paper uses widely recognized datasets, but it does not explicitly state the specific training, validation, and test dataset splits (e.g., percentages or sample counts) within the text.
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions using a 'pre-trained multi-lingual BERT' and links to its GitHub repository, but it does not specify concrete version numbers for BERT or other software dependencies like Python, PyTorch/TensorFlow, or specific libraries.
Experiment Setup Yes The dimension of the BERT CLS embedding is 768. We use a 300-dimension MLP in Eq.(1) and a 11 plus 1-dimension MLP in Eq.(5). The maximal number of neighbors and attributes are both set as 50. In Eq.(3), we use 20 semantic matching kernels, where µ is from 0.025 to 0.975 with interval 0.05 and all σ = 0.1, and use an exact matching kernel with µ = 1.0 and σ = 10 3. The number of the returned candidates by the basic BERT unit, i.e., κ is set as 50, as we find that 99% ground truth can be included in the top-50 candidates. The margin m in Eq.(2) for fine-tuning BERT is set as 3, and for training the interaction model is set as 1.