K-BERT: Enabling Language Representation with Knowledge Graph

Authors: Weijie Liu, Peng Zhou, Zhe Zhao, Zhiruo Wang, Qi Ju, Haotang Deng, Ping Wang2901-2908

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our investigation reveals promising results in twelve NLP tasks. Especially in domain-specific tasks (including finance, law, and medicine), K-BERT significantly outperforms BERT, which demonstrates that K-BERT is an excellent choice for solving the knowledge-driven problems that require experts. In this section, we present the details of the K-BERT fine-tuning results on twelve Chinese NLP tasks, among which eight are open-domain, and four are specific-domain.
Researcher Affiliation Collaboration Weijie Liu,1 Peng Zhou,2 Zhe Zhao,2 Zhiruo Wang,3 Qi Ju,2,* Haotang Deng,2 Ping Wang1, 1Peking University, Beijing, China 2Tencent Research, Beijing, China 3Beijing Normal University, Beijing, China
Pseudocode No The paper describes the methodology in prose and with diagrams (Figure 1, 2, 4) but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes The codes of K-BERT and our self-developed knowledge graphs are publicly available at https://github.com/autoliuweijie/K-BERT.
Open Datasets Yes Wiki Zh Wiki Zh refers to the Chinese Wikipedia corpus, which is used to train Chinese BERT in (Devlin et al. 2018). Wiki Zh contains a total of 1 million well-formed Chinese entries with 12 million sentences and size of 1.2G. Webtext Zh Webtext Zh is a large-scale, high-quality Chinese question and answer (Q&A) corpus with 4.1 million entries and a size of 3.7G. ... We employ three Chinese KGs, CN-DBpedia4, How Net5 and Medical KG. ... The codes of K-BERT and our self-developed knowledge graphs are publicly available at https://github.com/autoliuweijie/K-BERT.
Dataset Splits Yes Each of the above datasets is divided into three parts: train, dev, and test. We use the train part to fine-tune the model and then evaluate its performance on the dev and test parts.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies No The paper mentions using BERT models and references the UER toolkit but does not provide specific version numbers for software dependencies like Python, PyTorch, or TensorFlow.
Experiment Setup Yes We denote the number of (mask-)self-attention layers and heads as L and A respectively, and the hidden dimension of embedding vectors as H. In detail, we have the following model configuration: L = 12, A = 12 and H = 768. The total amounts of trainable parameters of both BERT and KBERT are the same (110M)... For K-BERT pre-training, all settings are consistent with (Devlin et al. 2018).