GreaseLM: Graph REASoning Enhanced Language Models

Authors: Xikun Zhang, Antoine Bosselut, Michihiro Yasunaga, Hongyu Ren, Percy Liang, Christopher D Manning, Jure Leskovec

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our results on three benchmarks in the commonsense reasoning (i.e., Commonsense QA, Openbook QA) and medical question answering (i.e., Med QA-USMLE) domains demonstrate that GREASELM can more reliably answer questions that require reasoning over both situational constraints and structured knowledge, even outperforming models 8 larger.1
Researcher Affiliation Academia Xikun Zhang, Antoine Bosselut, Michihiro Yasunaga, Hongyu Ren Percy Liang, Christopher D. Manning, Jure Leskovec Stanford University {xikunz2,antoineb,myasu,hyren,pliang,manning,jure}@cs.stanford.edu
Pseudocode No The paper describes its architecture and various operations using mathematical equations (e.g., Eqs. 1-11), but it does not include a formally labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code Yes All code, data and pretrained models are available at https://github.com/snap-stanford/ Grease LM.
Open Datasets Yes We evaluate GREASELM on three diverse multiple-choice question answering datasets across two domains: Commonsense QA (Talmor et al., 2019) and Open Book QA (Mihaylov et al., 2018) as commonsense reasoning benchmarks, and Med QA-USMLE (Jin et al., 2021) as a clinical QA task.
Dataset Splits Yes We perform our experiments using the in-house data split of Lin et al. (2019) to compare to baseline methods.
Hardware Specification No The paper describes its model architecture and training process, but it does not specify any hardware details such as GPU or CPU models used for experiments.
Software Dependencies No The paper mentions using specific language models like RoBERTa-Large, Aristo RoBERTa, SapBERT, Pubmed BERT, and Bio BERT, but it does not provide specific version numbers for underlying software dependencies (e.g., Python, PyTorch/TensorFlow, CUDA).
Experiment Setup Yes Table 7: Hyperparameter settings for models and experiments