Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering

Authors: Akari Asai, Kazuma Hashimoto, Hannaneh Hajishirzi, Richard Socher, Caiming Xiong

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show state-of-the-art results in three open-domain QA datasets, showcasing the effectiveness and robustness of our method. Notably, our method achieves significant improvement in Hotpot QA, outperforming the previous best model by more than 14 points.1
Researcher Affiliation Collaboration University of Washington Salesforce Research Allen Institute for Artificial Intelligence {akari,hannaneh}@cs.washington.edu {k.hashimoto,rsocher,cxiong}@salesforce.com
Pseudocode No No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code Yes 1Our code and data id available at https://github.com/Akari Asai/learning_to_ retrieve_reasoning_paths.
Open Datasets Yes We evaluate our method in three open-domain Wikipedia-sourced datasets: Hotpot QA, SQu AD Open and Natural Questions Open.
Dataset Splits Yes The Hotpot QA training, development, and test datasets contain 90,564, 7,405 and 7,405 questions, respectively.
Hardware Specification No The paper states, 'our retriever can be handled on a single GPU machine,' but does not specify any exact GPU model, CPU model, or other detailed hardware specifications.
Software Dependencies No The paper mentions 'pytorch-transformers' and 'Py Torch' as software used, and 'Adam optimizer' for optimization, but specific version numbers for these software components are not provided.
Experiment Setup Yes To train our recurrent retriever, we set the learning rate to 3 x 10^-5, and the maximum number of the training epochs to three. The mini-batch size is four; a mini-batch example consists of a question with its corresponding paragraphs. To train our reader model, we set the learning rate to 3 x 10^-5, and the maximum number of training epochs to two. Empirically we observe better performance with a larger batch size as discussed in previous work (Liu et al., 2019; Ott et al., 2018), and thus we set the mini-batch size to 120.