Graph Reasoning Transformers for Knowledge-Aware Question Answering

Authors: Ruilin Zhao, Feng Zhao, Liang Hu, Guandong Xu

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments conducted on three knowledge-intensive QA benchmarks show that the GRT outperforms the state-of-the-art KG-augmented QA systems, demonstrating the effectiveness and adaptation of our proposed model.
Researcher Affiliation Academia Ruilin Zhao1,3, Feng Zhao1*, Liang Hu2, Guandong Xu3 1Natural Language Processing and Knowledge Graph Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China 2College of Electronic and Information Engineering, Tongji University, Shanghai, China 3Data Science and Machine Intelligence Lab, University of Technology Sydney, Sydney, Australia {ruilinzhao,zhaof}@hust.edu.cn, lianghu@tongji.edu.cn, guandong.xu@uts.edu.au
Pseudocode No The paper describes methods and equations but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Our code is available at https://github.com/HUSTNLP-codes/GRT
Open Datasets Yes Commonsense QA: This is a commonsense QA dataset... we adopt the in-house (IH) split (Lin et al. 2019) used in prior studies for evaluations. Openbook QA: This is a commonsense QA dataset... In this work, we utilize the official data splits (Mihaylov and Frank 2018) for our evaluations. Med QA-USMLE: This is a medical-domain QA dataset... In this work, we utilize the official data splits (Jin et al. 2020) for evaluation purposes.
Dataset Splits Yes For Commonsense QA: 'we adopt the in-house (IH) split (Lin et al. 2019) used in prior studies for evaluations.' For Openbook QA: 'we utilize the official data splits (Mihaylov and Frank 2018) for our evaluations.' For Med QA-USMLE: 'we utilize the official data splits (Jin et al. 2020) for evaluation purposes.' The tables also include 'IHdev-Acc', implying a development/validation set.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU models, CPU types, or memory) used for running the experiments. It only discusses software components and models.
Software Dependencies No The paper mentions using 'RoBERTa-large' and 'SapBERT-Base' as LM backbones, along with 'Clinical BERT' and 'BioBERT' for comparison. However, it does not provide specific version numbers for these models or any other software libraries (e.g., PyTorch, TensorFlow, Python version) that would be needed for reproducible setup.
Experiment Setup No The paper describes the datasets and baseline models but does not provide specific hyperparameters (e.g., learning rate, batch size, number of epochs) or other detailed training configurations within the main text.