JAKET: Joint Pre-training of Knowledge Graph and Language Understanding
Authors: Donghan Yu, Chenguang Zhu, Yiming Yang, Michael Zeng11630-11638
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiment results on several knowledge-aware NLP tasks show that our proposed framework achieves superior performance by effectively leveraging knowledge in language understanding. |
| Researcher Affiliation | Collaboration | Donghan Yu1*, Chenguang Zhu2*, Yiming Yang1, Michael Zeng2 1 Carnegie Mellon University 2 Microsoft Cognitive Services Research Group dyu2@cs.cmu.edu, chezhu@microsoft.com |
| Pseudocode | No | The paper describes the model's steps and logic in textual form and diagrams, but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not include any explicit statement about releasing source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | Data for Pre-training. We use the English Wikipedia as the text corpus, Wikidata (Vrandeˇci c and Kr otzsch 2014) as the knowledge graph, and SLING (Ringgaard, Gupta, and Pereira 2017) to identify entity mentions. |
| Dataset Splits | Yes | We conduct experiments under a semi-supervised transductive setting by splitting the entities in KG into train/dev/test splits of 20%, 20% and 60%. |
| Hardware Specification | Yes | The computing infrastructure we use is the NVIDIA V100 GPU in all the experiments. |
| Software Dependencies | No | The paper states, 'Our implementation is based on the Hugging Face framework (Wolf et al. 2019) and DGL (Wang et al. 2019a)', but it does not specify version numbers for these software components or any other libraries. |
| Experiment Setup | Yes | During pre-training, the batch size and length of text sequences are 1024 and 512 respectively. The batch size of KG entities is 16,384. The number of training epochs is 8. JAKET is optimized by Adam W (Loshchilov and Hutter 2019) using the following parameters: β1 = 0.9, β2 = 0.999, ϵ = 1e-8, and weight decay of 0.01. The learning rate of the language module is warmed up over the first 3,000 steps to a peak value of 1e-5, and then linearly decayed. The learning rate of our knowledge module starts from 1e-4 and then linearly decayed. |