reproducibilityindex.ai

Evidence Aggregation for Answer Re-Ranking in Open-Domain Question Answering

Authors: Shuohang Wang, Mo Yu, Jing Jiang, Wei Zhang, Xiaoxiao Guo, Shiyu Chang, Zhiguo Wang, Tim Klinger, Gerald Tesauro, Murray Campbell

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our models have achieved state-of-the-art results on three public open-domain QA datasets: Quasar-T, Search QA and the open-domain version of Trivia QA, with about 8 percentage points of improvement over the former two datasets.
Researcher Affiliation	Collaboration	1School of Information System, Singapore Management University 2AI Foundations Learning, IBM Research AI
Pseudocode	No	The paper describes its methods through prose and mathematical equations but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our code will be released under https://github.com/shuohangwang/mprc.
Open Datasets	Yes	We conduct experiments on three publicly available open-domain QA datasets, namely, Quasar T (Dhingra et al., 2017b), Search QA (Dunn et al., 2017) and Trivia QA (Joshi et al., 2017). ... Quasar-T 5 (Dhingra et al., 2017b) 5https://github.com/bdhingra/quasar ... Search QA 6 (Dunn et al., 2017) 6https://github.com/nyu-dl/Search QA ... Trivia QA (Open-Domain Setting) 7 (Joshi et al., 2017) 7http://nlp.cs.washington.edu/triviaqa/data/triviaqa-unfiltered.tar.gz
Dataset Splits	Yes	The statistics of the three datasets are shown in Table 1. ... Table 1: Statistics of the datasets. #q(train) #q(dev) #q(test) ... Quasar-T 28,496 3,000 3,000
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running the experiments.
Software Dependencies	No	The paper mentions software like GloVe for word embeddings and Adam for optimization, but does not specify version numbers for these or any other key software dependencies (e.g., Python, deep learning frameworks).
Experiment Setup	Yes	For the coverage-based re-ranker, we use Adam (Kingma & Ba, 2015) to optimize the model. Word embeddings are initialized by Glo Ve (Pennington et al., 2014) and are not updated during training. We set all the words beyond Glove as zero vectors. We set l to 300, batch size to 30, learning rate to 0.002. We tune the dropout probability from 0 to 0.5 and the number of candidate answers for re-ranking (K) in [3, 5, 10].