reproducibilityindex.ai

RPA: Reasoning Path Augmentation in Iterative Retrieving for Multi-Hop QA

Authors: Ziyi Cao, Bingquan Liu, Shaobo Li

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We build RPA with a naive pre-trained model and evaluate RPA on the QASC and Multi RC datasets. The evaluation results demonstrate that RPA outperforms previously published reasoning path retrieval methods, showing the effectiveness of the proposed methods. Moreover, we present detailed experiments on how the orders of justiﬁcations and the percent of augmented paths affect the questionanswering performance, revealing the importance of polishing RPs and the necessity of augmentation.
Researcher Affiliation	Academia	Harbin Institute of Technology zyc@stu.hit.edu.cn, liubq@hit.edu.cn, shli@insun.hit.edu.cn
Pseudocode	No	The paper describes its approach and pipeline using text and figures, but no explicit pseudocode blocks or algorithm listings are provided.
Open Source Code	No	The paper does not contain any statement about making the source code for the described methodology publicly available, nor does it provide a link to a code repository.
Open Datasets	Yes	We evaluated our method on two datasets: Question Answering via Sentence Composition (QASC), a large KB-based multiple-choice QA task8 (Khot et al. 2020). ... Multi-Sentence Reading Comprehension (Multi RC), a reading comprehension dataset consists of multiple choices QA (Khashabi et al. 2018a). In the development set, every question with 2-to-14 answer candidates is supported with a paragraph, which contains 2-to-4 justiﬁcations. The dataset we use is the original Multi RC9...The original Multi RC contains the training, development, and hidden test set, out of which the training and development set is used in the paper.
Dataset Splits	No	The paper mentions using 'training', 'development', and 'test' sets from the Multi RC dataset, and 'development set' for QASC, implying pre-defined splits. However, it does not explicitly state the specific percentages or exact sample counts for these splits within the paper itself.
Hardware Specification	No	The paper describes training parameters and models used (e.g., RoBERTa-Large) but does not provide any specific details about the hardware (e.g., GPU models, CPU specifications) used for running the experiments.
Software Dependencies	No	The paper mentions software components like 'RoBERTa-Large', 'Lamb' optimizer, and 'Lucene', but it does not specify version numbers for these software dependencies, which are necessary for full reproducibility.
Experiment Setup	Yes	Hyperparameters of RPA, whose basic structure is shown in Figure 6, are from the Trainer that is with saving steps of 2 and epoch of 500 saving steps, ﬁne-tuned from Ro BERTa-Large (Liu et al. 2019; Wolf et al. 2020; Xiong et al. 2021). More speciﬁcally, we trained with batch size of 20 in QASC and 4 in Multi RC6, chunk size of 50, and the optimizer of Lamb (You et al. 2020), whose learning rate is 5e-6. For training, in QASC, we uniformly sampled 5 negatives from ANN top 100, and in Multi RC, sampled 2 negatives from all sentences (the number of f in FC varies from 6 to 20 in training data). In the answer classiﬁer, we used batch size 2, maximum sequence length 256 for QASC7.