reproducibilityindex.ai

Multi-step Retriever-Reader Interaction for Scalable Open-domain Question Answering

Authors: Rajarshi Das, Shehzaad Dhuliawala, Manzil Zaheer, Andrew McCallum

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct analysis and show that iterative interaction helps in retrieving informative paragraphs from the corpus. Finally, we show that our multistep-reasoning framework brings consistent improvement when applied to two widely used reader architectures (DR.QA and BIDAF) on various large open-domain datasets TRIVIAQA-unﬁltered, QUASAR-T, SEARCHQA, and SQUAD-open1.
Researcher Affiliation	Collaboration	Rajarshi Das1, Shehzaad Dhuliawala2, Manzil Zaheer3 & Andrew Mc Callum1 {rajarshi,mccallum}@cs.umass.edu shehzaad.dhuliawala@microsoft.com, manzil@zaheer.ml 1 University of Massachusetts, Amherst, 2 Microsoft Research, Montr eal 3 Google AI, Mountain View
Pseudocode	Yes	Algorithm 1 Multi-step reasoning for open-domain QA
Open Source Code	Yes	1Code and pretrained models are available at https://github.com/rajarshd/Multi-Step-Reasoning
Open Datasets	Yes	We experiment on the following large open-domain QA datasets (a) TRIVIAQA-unﬁltered a version of TRIVIAQA (Joshi et al., 2017) built for open-domain QA. (c) SEARCHQA (Dunn et al., 2017) is another open-domain dataset... (d) QUASAR-T (Dhingra et al., 2017)... (e) SQUAD-open We also experimented on the open domain version of the SQUAD dataset. For fair comparison to baselines, our evidence corpus was created by retrieving the top-5 wikipedia documents as returned by the pipeline of Chen et al. (2017).
Dataset Splits	No	The paper mentions 'development set' and 'test set' but does not provide specific percentages or counts for training, validation, and test splits across all datasets, nor does it explicitly cite a source for predefined splits.
Hardware Specification	Yes	To test for scalability, we increase the number of paragraphs ranging from 500 to 100 million and test on a single Titan-X GPU.
Software Dependencies	No	The paper mentions algorithms and frameworks used (e.g., Adam, LSTM, GRU, Dr QA, Bi DAF) but does not provide specific software dependencies with version numbers.
Experiment Setup	Yes	The number of layers of the bi-directional LSTM encoder is set to three and we use Adam (Kingma & Ba, 2014) for optimization.