RPA: Reasoning Path Augmentation in Iterative Retrieving for Multi-Hop QA
Authors: Ziyi Cao, Bingquan Liu, Shaobo Li
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We build RPA with a naive pre-trained model and evaluate RPA on the QASC and Multi RC datasets. The evaluation results demonstrate that RPA outperforms previously published reasoning path retrieval methods, showing the effectiveness of the proposed methods. Moreover, we present detailed experiments on how the orders of justifications and the percent of augmented paths affect the questionanswering performance, revealing the importance of polishing RPs and the necessity of augmentation. |
| Researcher Affiliation | Academia | Harbin Institute of Technology zyc@stu.hit.edu.cn, liubq@hit.edu.cn, shli@insun.hit.edu.cn |
| Pseudocode | No | The paper describes its approach and pipeline using text and figures, but no explicit pseudocode blocks or algorithm listings are provided. |
| Open Source Code | No | The paper does not contain any statement about making the source code for the described methodology publicly available, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We evaluated our method on two datasets: Question Answering via Sentence Composition (QASC), a large KB-based multiple-choice QA task8 (Khot et al. 2020). ... Multi-Sentence Reading Comprehension (Multi RC), a reading comprehension dataset consists of multiple choices QA (Khashabi et al. 2018a). In the development set, every question with 2-to-14 answer candidates is supported with a paragraph, which contains 2-to-4 justifications. The dataset we use is the original Multi RC9...The original Multi RC contains the training, development, and hidden test set, out of which the training and development set is used in the paper. |
| Dataset Splits | No | The paper mentions using 'training', 'development', and 'test' sets from the Multi RC dataset, and 'development set' for QASC, implying pre-defined splits. However, it does not explicitly state the specific percentages or exact sample counts for these splits within the paper itself. |
| Hardware Specification | No | The paper describes training parameters and models used (e.g., RoBERTa-Large) but does not provide any specific details about the hardware (e.g., GPU models, CPU specifications) used for running the experiments. |
| Software Dependencies | No | The paper mentions software components like 'RoBERTa-Large', 'Lamb' optimizer, and 'Lucene', but it does not specify version numbers for these software dependencies, which are necessary for full reproducibility. |
| Experiment Setup | Yes | Hyperparameters of RPA, whose basic structure is shown in Figure 6, are from the Trainer that is with saving steps of 2 and epoch of 500 saving steps, fine-tuned from Ro BERTa-Large (Liu et al. 2019; Wolf et al. 2020; Xiong et al. 2021). More specifically, we trained with batch size of 20 in QASC and 4 in Multi RC6, chunk size of 50, and the optimizer of Lamb (You et al. 2020), whose learning rate is 5e-6. For training, in QASC, we uniformly sampled 5 negatives from ANN top 100, and in Multi RC, sampled 2 negatives from all sentences (the number of f in FC varies from 6 to 20 in training data). In the answer classifier, we used batch size 2, maximum sequence length 256 for QASC7. |