Drop Redundant, Shrink Irrelevant: Selective Knowledge Injection for Language Pretraining
Authors: Ningyu Zhang, Shumin Deng, Xu Cheng, Xi Chen, Yichi Zhang, Wei Zhang, Huajun Chen
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on benchmark datasets demonstrate that our approach can enhance state-of-the-art knowledge injection methods. |
| Researcher Affiliation | Collaboration | Ningyu Zhang1,2 , Shumin Deng 1,2 , Xu Cheng3 , Xi Chen5 , Yichi Zhang4 , Wei Zhang4 , Huajun Chen1,2 1 Zhejiang University & AZFT Joint Lab for Knowledge Engine 2 Hangzhou Innovation Center, Zhejiang University 3 National Engineering Laboratory for Improving the Government s Governance Capability Big Data Application Technology 4 Alibaba Group 5 Tencent |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not explicitly state that source code for the described methodology is being released or provide a link to a code repository. |
| Open Datasets | Yes | TACRED [Zhang et al., 2017] is a large-scale relation extraction dataset that covers 42 relation types and contains 106,264 sentences. Open Entity [Choi et al., 2018] is a completely manually annotated entity typing dataset. Search QA [Dunn et al., 2017] is a large-scale question answering dataset that is constructed to reflect a full pipeline of general question answering. Quasa-T [Dhingra et al., 2017] is a large-scale questionanswering dataset consisting of 43,000 open-domain trivia questions and their answers that are obtained from various internet sources. GLUE [Wang et al., 2019a] is a benchmark with nine diverse NLP tasks. |
| Dataset Splits | No | The paper mentions several datasets (TACRED, Open Entity, Search QA, Quasar-T, GLUE) but only explicitly refers to the 'GLUE dev set' in Table 2. It does not provide specific training/validation/test splits (e.g., percentages or sample counts) for all datasets used, nor does it cite predefined splits for all of them. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions using pre-trained models like BERT-base and RoBERTa-base, and implementing ERNIE and Know BERT, but does not provide specific version numbers for software dependencies or libraries (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | In this case, η was set to 0.001, k was set to 1, λ was set to 0.5, γ was set to 0.0001, Khop/thresh/min/max was set to {6,100,5,20}, and the batch size was set to 32. |