Knowledge Graph Prompting for Multi-Document Question Answering

Authors: Yu Wang, Nedim Lipka, Ryan A. Rossi, Alexa Siu, Ruiyi Zhang, Tyler Derr

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments underscore the efficacy of KGP for MD-QA, signifying the potential of leveraging graphs in enhancing the prompt design and retrieval augmented generation for LLMs. Our code: https://github.com/Yu WVandy/KG-LLM-MDQA. ... We compare the MD-QA performance of the proposed KGP-T5 and other baselines in Table 1.
Researcher Affiliation Collaboration Yu Wang,1 Nedim Lipka,2 Ryan A. Rossi,2 Alexa Siu,2 Ruiyi Zhang,2 Tyler Derr1 1 Vanderbilt University, Nashville, USA 2 Adobe Research, San Jose, USA
Pseudocode Yes Algorithm 1: LLM-based KG Traversal Algorithm to Retrieve Relevant Context for Content-based Question.
Open Source Code Yes Our code: https://github.com/Yu WVandy/KG-LLM-MDQA.
Open Datasets Yes we randomly sample multi-document questions from the development set of 2Wiki MQA (Ho et al. 2020) and Mu Si Que (Trivedi et al. 2022b)... We randomly sample questions from Hotpot QA and construct KGs over the set of documents for each of these questions using our proposed methods.
Dataset Splits No The paper mentions sampling from 'development set' for some datasets and using 'Hotpot QA', but does not explicitly provide the specific train/validation/test splits (e.g., percentages or sample counts) used in their experiments.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments are mentioned in the paper.
Software Dependencies No The paper mentions various software components and models (e.g., T5, LLaMA, RoBERTa-base, TF-IDF, Extract-PDF API) but does not provide specific version numbers for these dependencies.
Experiment Setup No The paper states, 'Detailed experimental setting is presented in Section 13. Due to the space limitation, we comprehensively introduce our experimental setting, including dataset collection, baselines, and evaluation criteria, in Supplementary 8.1-8.2.' These details are deferred to an external supplementary document not provided in the main paper.