Knowledge Graph Prompting for Multi-Document Question Answering
Authors: Yu Wang, Nedim Lipka, Ryan A. Rossi, Alexa Siu, Ruiyi Zhang, Tyler Derr
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments underscore the efficacy of KGP for MD-QA, signifying the potential of leveraging graphs in enhancing the prompt design and retrieval augmented generation for LLMs. Our code: https://github.com/Yu WVandy/KG-LLM-MDQA. ... We compare the MD-QA performance of the proposed KGP-T5 and other baselines in Table 1. |
| Researcher Affiliation | Collaboration | Yu Wang,1 Nedim Lipka,2 Ryan A. Rossi,2 Alexa Siu,2 Ruiyi Zhang,2 Tyler Derr1 1 Vanderbilt University, Nashville, USA 2 Adobe Research, San Jose, USA |
| Pseudocode | Yes | Algorithm 1: LLM-based KG Traversal Algorithm to Retrieve Relevant Context for Content-based Question. |
| Open Source Code | Yes | Our code: https://github.com/Yu WVandy/KG-LLM-MDQA. |
| Open Datasets | Yes | we randomly sample multi-document questions from the development set of 2Wiki MQA (Ho et al. 2020) and Mu Si Que (Trivedi et al. 2022b)... We randomly sample questions from Hotpot QA and construct KGs over the set of documents for each of these questions using our proposed methods. |
| Dataset Splits | No | The paper mentions sampling from 'development set' for some datasets and using 'Hotpot QA', but does not explicitly provide the specific train/validation/test splits (e.g., percentages or sample counts) used in their experiments. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments are mentioned in the paper. |
| Software Dependencies | No | The paper mentions various software components and models (e.g., T5, LLaMA, RoBERTa-base, TF-IDF, Extract-PDF API) but does not provide specific version numbers for these dependencies. |
| Experiment Setup | No | The paper states, 'Detailed experimental setting is presented in Section 13. Due to the space limitation, we comprehensively introduce our experimental setting, including dataset collection, baselines, and evaluation criteria, in Supplementary 8.1-8.2.' These details are deferred to an external supplementary document not provided in the main paper. |