Deep Bidirectional Language-Knowledge Graph Pretraining

Authors: Michihiro Yasunaga, Antoine Bosselut, Hongyu Ren, Xikun Zhang, Christopher D Manning, Percy S. Liang, Jure Leskovec

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experiment with the proposed approach DRAGON in a general domain first. We pretrain DRAGON using the Book corpus and Concept Net KG ( 3.1), and evaluate on diverse downstream tasks ( 3.2). We show that DRAGON significantly improves on existing models ( 3.4). We extensively analyze the effect of DRAGON s key design choices such as self-supervision and use of KGs ( 3.4.1, 3.4.2, 3.4.3).
Researcher Affiliation Academia 1Stanford University 2EPFL Equal senior authorship {myasu,antoineb,hyren,xikunz2,manning,pliang,jure}@cs.stanford.edu
Pseudocode No The paper describes the model architecture and training objectives in prose and mathematical formulas but does not provide any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes Our code and trained models are available at https://github.com/michiyasunaga/dragon.
Open Datasets Yes Data. For the text data, we use documents involving commonsense, Book Corpus [55]. Book Corpus has 6GB of text from online books and is widely used in LM pretraining (e.g., BERT, RoBERTa). For the KG data, we use Concept Net [7], a general-domain knowledge graph designed to capture background commonsense knowledge. ... For the biomedical domain, we use Pub Med [73]... and UMLS [17].
Dataset Splits Yes For CSQA, we follow the in-house data splits used by prior works [32]. For OBQA, we follow the original setting where the models only use the question as input and do not use the extra science facts. Appendix B.4 provides the full details on these tasks and data splits.
Hardware Specification Yes Training took 7 days on eight A100 GPUs using FP16.
Software Dependencies No The paper does not explicitly list specific software dependencies with their version numbers in the main text.
Experiment Setup Yes To pretrain the model, we perform MLM with a token masking rate of 15% and link prediction with an edge drop rate of 15%. We pretrain for 20,000 steps with a batch size of 8,192 and a learning rate of 2e-5 for parameters in the LM component and 3e-4 for the others. Training took 7 days on eight A100 GPUs using FP16. Additional details on the hyperparameters can be found in Appendix B.3.