reproducibilityindex.ai

GreaseLM: Graph REASoning Enhanced Language Models

Authors: Xikun Zhang, Antoine Bosselut, Michihiro Yasunaga, Hongyu Ren, Percy Liang, Christopher D Manning, Jure Leskovec

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our results on three benchmarks in the commonsense reasoning (i.e., Commonsense QA, Openbook QA) and medical question answering (i.e., Med QA-USMLE) domains demonstrate that GREASELM can more reliably answer questions that require reasoning over both situational constraints and structured knowledge, even outperforming models 8 larger.1
Researcher Affiliation	Academia	Xikun Zhang, Antoine Bosselut, Michihiro Yasunaga, Hongyu Ren Percy Liang, Christopher D. Manning, Jure Leskovec Stanford University {xikunz2,antoineb,myasu,hyren,pliang,manning,jure}@cs.stanford.edu
Pseudocode	No	The paper describes its architecture and various operations using mathematical equations (e.g., Eqs. 1-11), but it does not include a formally labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code	Yes	All code, data and pretrained models are available at https://github.com/snap-stanford/ Grease LM.
Open Datasets	Yes	We evaluate GREASELM on three diverse multiple-choice question answering datasets across two domains: Commonsense QA (Talmor et al., 2019) and Open Book QA (Mihaylov et al., 2018) as commonsense reasoning benchmarks, and Med QA-USMLE (Jin et al., 2021) as a clinical QA task.
Dataset Splits	Yes	We perform our experiments using the in-house data split of Lin et al. (2019) to compare to baseline methods.
Hardware Specification	No	The paper describes its model architecture and training process, but it does not specify any hardware details such as GPU or CPU models used for experiments.
Software Dependencies	No	The paper mentions using specific language models like RoBERTa-Large, Aristo RoBERTa, SapBERT, Pubmed BERT, and Bio BERT, but it does not provide specific version numbers for underlying software dependencies (e.g., Python, PyTorch/TensorFlow, CUDA).
Experiment Setup	Yes	Table 7: Hyperparameter settings for models and experiments