Drop Redundant, Shrink Irrelevant: Selective Knowledge Injection for Language Pretraining

Authors: Ningyu Zhang, Shumin Deng, Xu Cheng, Xi Chen, Yichi Zhang, Wei Zhang, Huajun Chen

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on benchmark datasets demonstrate that our approach can enhance state-of-the-art knowledge injection methods.
Researcher Affiliation Collaboration Ningyu Zhang1,2 , Shumin Deng 1,2 , Xu Cheng3 , Xi Chen5 , Yichi Zhang4 , Wei Zhang4 , Huajun Chen1,2 1 Zhejiang University & AZFT Joint Lab for Knowledge Engine 2 Hangzhou Innovation Center, Zhejiang University 3 National Engineering Laboratory for Improving the Government s Governance Capability Big Data Application Technology 4 Alibaba Group 5 Tencent
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not explicitly state that source code for the described methodology is being released or provide a link to a code repository.
Open Datasets Yes TACRED [Zhang et al., 2017] is a large-scale relation extraction dataset that covers 42 relation types and contains 106,264 sentences. Open Entity [Choi et al., 2018] is a completely manually annotated entity typing dataset. Search QA [Dunn et al., 2017] is a large-scale question answering dataset that is constructed to reflect a full pipeline of general question answering. Quasa-T [Dhingra et al., 2017] is a large-scale questionanswering dataset consisting of 43,000 open-domain trivia questions and their answers that are obtained from various internet sources. GLUE [Wang et al., 2019a] is a benchmark with nine diverse NLP tasks.
Dataset Splits No The paper mentions several datasets (TACRED, Open Entity, Search QA, Quasar-T, GLUE) but only explicitly refers to the 'GLUE dev set' in Table 2. It does not provide specific training/validation/test splits (e.g., percentages or sample counts) for all datasets used, nor does it cite predefined splits for all of them.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, memory amounts) used for running its experiments.
Software Dependencies No The paper mentions using pre-trained models like BERT-base and RoBERTa-base, and implementing ERNIE and Know BERT, but does not provide specific version numbers for software dependencies or libraries (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes In this case, η was set to 0.001, k was set to 1, λ was set to 0.5, γ was set to 0.0001, Khop/thresh/min/max was set to {6,100,5,20}, and the batch size was set to 32.