reproducibilityindex.ai

Graph Reasoning Transformers for Knowledge-Aware Question Answering

Authors: Ruilin Zhao, Feng Zhao, Liang Hu, Guandong Xu

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments conducted on three knowledge-intensive QA benchmarks show that the GRT outperforms the state-of-the-art KG-augmented QA systems, demonstrating the effectiveness and adaptation of our proposed model.
Researcher Affiliation	Academia	Ruilin Zhao1,3, Feng Zhao1*, Liang Hu2, Guandong Xu3 1Natural Language Processing and Knowledge Graph Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China 2College of Electronic and Information Engineering, Tongji University, Shanghai, China 3Data Science and Machine Intelligence Lab, University of Technology Sydney, Sydney, Australia {ruilinzhao,zhaof}@hust.edu.cn, lianghu@tongji.edu.cn, guandong.xu@uts.edu.au
Pseudocode	No	The paper describes methods and equations but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	Our code is available at https://github.com/HUSTNLP-codes/GRT
Open Datasets	Yes	Commonsense QA: This is a commonsense QA dataset... we adopt the in-house (IH) split (Lin et al. 2019) used in prior studies for evaluations. Openbook QA: This is a commonsense QA dataset... In this work, we utilize the official data splits (Mihaylov and Frank 2018) for our evaluations. Med QA-USMLE: This is a medical-domain QA dataset... In this work, we utilize the official data splits (Jin et al. 2020) for evaluation purposes.
Dataset Splits	Yes	For Commonsense QA: 'we adopt the in-house (IH) split (Lin et al. 2019) used in prior studies for evaluations.' For Openbook QA: 'we utilize the official data splits (Mihaylov and Frank 2018) for our evaluations.' For Med QA-USMLE: 'we utilize the official data splits (Jin et al. 2020) for evaluation purposes.' The tables also include 'IHdev-Acc', implying a development/validation set.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU models, CPU types, or memory) used for running the experiments. It only discusses software components and models.
Software Dependencies	No	The paper mentions using 'RoBERTa-large' and 'SapBERT-Base' as LM backbones, along with 'Clinical BERT' and 'BioBERT' for comparison. However, it does not provide specific version numbers for these models or any other software libraries (e.g., PyTorch, TensorFlow, Python version) that would be needed for reproducible setup.
Experiment Setup	No	The paper describes the datasets and baseline models but does not provide specific hyperparameters (e.g., learning rate, batch size, number of epochs) or other detailed training configurations within the main text.