K-BERT: Enabling Language Representation with Knowledge Graph
Authors: Weijie Liu, Peng Zhou, Zhe Zhao, Zhiruo Wang, Qi Ju, Haotang Deng, Ping Wang2901-2908
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our investigation reveals promising results in twelve NLP tasks. Especially in domain-specific tasks (including finance, law, and medicine), K-BERT significantly outperforms BERT, which demonstrates that K-BERT is an excellent choice for solving the knowledge-driven problems that require experts. In this section, we present the details of the K-BERT fine-tuning results on twelve Chinese NLP tasks, among which eight are open-domain, and four are specific-domain. |
| Researcher Affiliation | Collaboration | Weijie Liu,1 Peng Zhou,2 Zhe Zhao,2 Zhiruo Wang,3 Qi Ju,2,* Haotang Deng,2 Ping Wang1, 1Peking University, Beijing, China 2Tencent Research, Beijing, China 3Beijing Normal University, Beijing, China |
| Pseudocode | No | The paper describes the methodology in prose and with diagrams (Figure 1, 2, 4) but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The codes of K-BERT and our self-developed knowledge graphs are publicly available at https://github.com/autoliuweijie/K-BERT. |
| Open Datasets | Yes | Wiki Zh Wiki Zh refers to the Chinese Wikipedia corpus, which is used to train Chinese BERT in (Devlin et al. 2018). Wiki Zh contains a total of 1 million well-formed Chinese entries with 12 million sentences and size of 1.2G. Webtext Zh Webtext Zh is a large-scale, high-quality Chinese question and answer (Q&A) corpus with 4.1 million entries and a size of 3.7G. ... We employ three Chinese KGs, CN-DBpedia4, How Net5 and Medical KG. ... The codes of K-BERT and our self-developed knowledge graphs are publicly available at https://github.com/autoliuweijie/K-BERT. |
| Dataset Splits | Yes | Each of the above datasets is divided into three parts: train, dev, and test. We use the train part to fine-tune the model and then evaluate its performance on the dev and test parts. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory, or cloud instance types) used for running the experiments. |
| Software Dependencies | No | The paper mentions using BERT models and references the UER toolkit but does not provide specific version numbers for software dependencies like Python, PyTorch, or TensorFlow. |
| Experiment Setup | Yes | We denote the number of (mask-)self-attention layers and heads as L and A respectively, and the hidden dimension of embedding vectors as H. In detail, we have the following model configuration: L = 12, A = 12 and H = 768. The total amounts of trainable parameters of both BERT and KBERT are the same (110M)... For K-BERT pre-training, all settings are consistent with (Devlin et al. 2018). |