DKPLM: Decomposable Knowledge-Enhanced Pre-trained Language Model for Natural Language Understanding

Authors: Taolin Zhang, Chengyu Wang, Nan Hu, Minghui Qiu, Chengguang Tang, Xiaofeng He, Jun Huang11703-11711

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that our model outperforms other KEPLMs significantly over zero-shot knowledge probing tasks and multiple knowledge-aware language understanding tasks. In the experiments, we evaluate our model against strong baseline KEPLMs pre-trained using the same data sources over various knowledge-related tasks, including knowledge probing (LAMA) (Petroni et al. 2019), relation extraction and entity typing.
Researcher Affiliation Collaboration Taolin Zhang1,3*, Chengyu Wang2*, Nan Hu5, Minghui Qiu2 , Chengguang Tang2 Xiaofeng He4,5 , Jun Huang2 1 School of Software Engineering, East China Normal University 2 Alibaba Group 3 Shanghai Key Laboratory of Trsustworthy Computing 4 NPPA Key Laboratory of Publishing Integration Development, ECNUP 5 School of Computer Science and Technology, East China Normal University zhangtl0519@gmail.com, hunan.vinny1997@gmail.com, hexf@cs.ecnu.edu.cn {chengyu.wcy, minghui.qmh, chengguang.tcg, huangjun.hj}@alibaba-inc.com
Pseudocode No The paper describes methods in prose and mathematical formulas, but does not include structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement about releasing source code or links to a code repository.
Open Datasets Yes In this paper, we use English Wikipedia (2020/03/01) 2 as our pre-training data source, and Wiki Extractor 3 to process the downloaded Wikipedia dump, similar to Co LAKE (Sun et al. 2020) and ERNIE-THU (Zhang et al. 2019). We use Wikipedia anchors to align the entities in the pre-training texts recognized by entity linking tools (e.g., TAGME (Ferragina and Scaiella 2010)) to Wiki Data5M (Wang et al. 2019b), which is a large-scale proposed KG data source including relation triples and entity description texts.
Dataset Splits No The paper refers to pre-training and fine-tuning stages and evaluation on multiple datasets but does not explicitly provide specific train/validation/test dataset splits (e.g., percentages or sample counts) for its experiments.
Hardware Specification No The paper provides running times for different models but does not specify any hardware details (e.g., GPU model, CPU type, memory) used for running the experiments.
Software Dependencies No The paper mentions tools like 'Wiki Extractor' and 'TAGME' and refers to models like BERT and RoBERTa, but it does not provide specific version numbers for these or any other software libraries or dependencies required to replicate the experiment.
Experiment Setup No The paper discusses 'Pre-training Data and Model Settings' and mentions using BERT-base and RoBERTa-base as backbones. However, it does not specify concrete hyperparameters (e.g., learning rate, batch size, number of epochs) or detailed training configurations required for reproducibility in the main text.