Multi-step Retriever-Reader Interaction for Scalable Open-domain Question Answering
Authors: Rajarshi Das, Shehzaad Dhuliawala, Manzil Zaheer, Andrew McCallum
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct analysis and show that iterative interaction helps in retrieving informative paragraphs from the corpus. Finally, we show that our multistep-reasoning framework brings consistent improvement when applied to two widely used reader architectures (DR.QA and BIDAF) on various large open-domain datasets TRIVIAQA-unfiltered, QUASAR-T, SEARCHQA, and SQUAD-open1. |
| Researcher Affiliation | Collaboration | Rajarshi Das1, Shehzaad Dhuliawala2, Manzil Zaheer3 & Andrew Mc Callum1 {rajarshi,mccallum}@cs.umass.edu shehzaad.dhuliawala@microsoft.com, manzil@zaheer.ml 1 University of Massachusetts, Amherst, 2 Microsoft Research, Montr eal 3 Google AI, Mountain View |
| Pseudocode | Yes | Algorithm 1 Multi-step reasoning for open-domain QA |
| Open Source Code | Yes | 1Code and pretrained models are available at https://github.com/rajarshd/Multi-Step-Reasoning |
| Open Datasets | Yes | We experiment on the following large open-domain QA datasets (a) TRIVIAQA-unfiltered a version of TRIVIAQA (Joshi et al., 2017) built for open-domain QA. (c) SEARCHQA (Dunn et al., 2017) is another open-domain dataset... (d) QUASAR-T (Dhingra et al., 2017)... (e) SQUAD-open We also experimented on the open domain version of the SQUAD dataset. For fair comparison to baselines, our evidence corpus was created by retrieving the top-5 wikipedia documents as returned by the pipeline of Chen et al. (2017). |
| Dataset Splits | No | The paper mentions 'development set' and 'test set' but does not provide specific percentages or counts for training, validation, and test splits across all datasets, nor does it explicitly cite a source for predefined splits. |
| Hardware Specification | Yes | To test for scalability, we increase the number of paragraphs ranging from 500 to 100 million and test on a single Titan-X GPU. |
| Software Dependencies | No | The paper mentions algorithms and frameworks used (e.g., Adam, LSTM, GRU, Dr QA, Bi DAF) but does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | The number of layers of the bi-directional LSTM encoder is set to three and we use Adam (Kingma & Ba, 2014) for optimization. |