reproducibilityindex.ai

Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering

Authors: Akari Asai, Kazuma Hashimoto, Hannaneh Hajishirzi, Richard Socher, Caiming Xiong

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show state-of-the-art results in three open-domain QA datasets, showcasing the effectiveness and robustness of our method. Notably, our method achieves signiﬁcant improvement in Hotpot QA, outperforming the previous best model by more than 14 points.1
Researcher Affiliation	Collaboration	University of Washington Salesforce Research Allen Institute for Artiﬁcial Intelligence {akari,hannaneh}@cs.washington.edu {k.hashimoto,rsocher,cxiong}@salesforce.com
Pseudocode	No	No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code	Yes	1Our code and data id available at https://github.com/Akari Asai/learning_to_ retrieve_reasoning_paths.
Open Datasets	Yes	We evaluate our method in three open-domain Wikipedia-sourced datasets: Hotpot QA, SQu AD Open and Natural Questions Open.
Dataset Splits	Yes	The Hotpot QA training, development, and test datasets contain 90,564, 7,405 and 7,405 questions, respectively.
Hardware Specification	No	The paper states, 'our retriever can be handled on a single GPU machine,' but does not specify any exact GPU model, CPU model, or other detailed hardware specifications.
Software Dependencies	No	The paper mentions 'pytorch-transformers' and 'Py Torch' as software used, and 'Adam optimizer' for optimization, but specific version numbers for these software components are not provided.
Experiment Setup	Yes	To train our recurrent retriever, we set the learning rate to 3 x 10^-5, and the maximum number of the training epochs to three. The mini-batch size is four; a mini-batch example consists of a question with its corresponding paragraphs. To train our reader model, we set the learning rate to 3 x 10^-5, and the maximum number of training epochs to two. Empirically we observe better performance with a larger batch size as discussed in previous work (Liu et al., 2019; Ott et al., 2018), and thus we set the mini-batch size to 120.