GreaseLM: Graph REASoning Enhanced Language Models
Authors: Xikun Zhang, Antoine Bosselut, Michihiro Yasunaga, Hongyu Ren, Percy Liang, Christopher D Manning, Jure Leskovec
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our results on three benchmarks in the commonsense reasoning (i.e., Commonsense QA, Openbook QA) and medical question answering (i.e., Med QA-USMLE) domains demonstrate that GREASELM can more reliably answer questions that require reasoning over both situational constraints and structured knowledge, even outperforming models 8 larger.1 |
| Researcher Affiliation | Academia | Xikun Zhang, Antoine Bosselut, Michihiro Yasunaga, Hongyu Ren Percy Liang, Christopher D. Manning, Jure Leskovec Stanford University {xikunz2,antoineb,myasu,hyren,pliang,manning,jure}@cs.stanford.edu |
| Pseudocode | No | The paper describes its architecture and various operations using mathematical equations (e.g., Eqs. 1-11), but it does not include a formally labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | All code, data and pretrained models are available at https://github.com/snap-stanford/ Grease LM. |
| Open Datasets | Yes | We evaluate GREASELM on three diverse multiple-choice question answering datasets across two domains: Commonsense QA (Talmor et al., 2019) and Open Book QA (Mihaylov et al., 2018) as commonsense reasoning benchmarks, and Med QA-USMLE (Jin et al., 2021) as a clinical QA task. |
| Dataset Splits | Yes | We perform our experiments using the in-house data split of Lin et al. (2019) to compare to baseline methods. |
| Hardware Specification | No | The paper describes its model architecture and training process, but it does not specify any hardware details such as GPU or CPU models used for experiments. |
| Software Dependencies | No | The paper mentions using specific language models like RoBERTa-Large, Aristo RoBERTa, SapBERT, Pubmed BERT, and Bio BERT, but it does not provide specific version numbers for underlying software dependencies (e.g., Python, PyTorch/TensorFlow, CUDA). |
| Experiment Setup | Yes | Table 7: Hyperparameter settings for models and experiments |