reproducibilityindex.ai

JAKET: Joint Pre-training of Knowledge Graph and Language Understanding

Authors: Donghan Yu, Chenguang Zhu, Yiming Yang, Michael Zeng11630-11638

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiment results on several knowledge-aware NLP tasks show that our proposed framework achieves superior performance by effectively leveraging knowledge in language understanding.
Researcher Affiliation	Collaboration	Donghan Yu1, Chenguang Zhu2, Yiming Yang1, Michael Zeng2 1 Carnegie Mellon University 2 Microsoft Cognitive Services Research Group dyu2@cs.cmu.edu, chezhu@microsoft.com
Pseudocode	No	The paper describes the model's steps and logic in textual form and diagrams, but does not provide structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not include any explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets	Yes	Data for Pre-training. We use the English Wikipedia as the text corpus, Wikidata (Vrandeˇci c and Kr otzsch 2014) as the knowledge graph, and SLING (Ringgaard, Gupta, and Pereira 2017) to identify entity mentions.
Dataset Splits	Yes	We conduct experiments under a semi-supervised transductive setting by splitting the entities in KG into train/dev/test splits of 20%, 20% and 60%.
Hardware Specification	Yes	The computing infrastructure we use is the NVIDIA V100 GPU in all the experiments.
Software Dependencies	No	The paper states, 'Our implementation is based on the Hugging Face framework (Wolf et al. 2019) and DGL (Wang et al. 2019a)', but it does not specify version numbers for these software components or any other libraries.
Experiment Setup	Yes	During pre-training, the batch size and length of text sequences are 1024 and 512 respectively. The batch size of KG entities is 16,384. The number of training epochs is 8. JAKET is optimized by Adam W (Loshchilov and Hutter 2019) using the following parameters: β1 = 0.9, β2 = 0.999, ϵ = 1e-8, and weight decay of 0.01. The learning rate of the language module is warmed up over the ﬁrst 3,000 steps to a peak value of 1e-5, and then linearly decayed. The learning rate of our knowledge module starts from 1e-4 and then linearly decayed.