Evidence Aggregation for Answer Re-Ranking in Open-Domain Question Answering
Authors: Shuohang Wang, Mo Yu, Jing Jiang, Wei Zhang, Xiaoxiao Guo, Shiyu Chang, Zhiguo Wang, Tim Klinger, Gerald Tesauro, Murray Campbell
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our models have achieved state-of-the-art results on three public open-domain QA datasets: Quasar-T, Search QA and the open-domain version of Trivia QA, with about 8 percentage points of improvement over the former two datasets. |
| Researcher Affiliation | Collaboration | 1School of Information System, Singapore Management University 2AI Foundations Learning, IBM Research AI |
| Pseudocode | No | The paper describes its methods through prose and mathematical equations but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code will be released under https://github.com/shuohangwang/mprc. |
| Open Datasets | Yes | We conduct experiments on three publicly available open-domain QA datasets, namely, Quasar T (Dhingra et al., 2017b), Search QA (Dunn et al., 2017) and Trivia QA (Joshi et al., 2017). ... Quasar-T 5 (Dhingra et al., 2017b) 5https://github.com/bdhingra/quasar ... Search QA 6 (Dunn et al., 2017) 6https://github.com/nyu-dl/Search QA ... Trivia QA (Open-Domain Setting) 7 (Joshi et al., 2017) 7http://nlp.cs.washington.edu/triviaqa/data/triviaqa-unfiltered.tar.gz |
| Dataset Splits | Yes | The statistics of the three datasets are shown in Table 1. ... Table 1: Statistics of the datasets. #q(train) #q(dev) #q(test) ... Quasar-T 28,496 3,000 3,000 |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running the experiments. |
| Software Dependencies | No | The paper mentions software like GloVe for word embeddings and Adam for optimization, but does not specify version numbers for these or any other key software dependencies (e.g., Python, deep learning frameworks). |
| Experiment Setup | Yes | For the coverage-based re-ranker, we use Adam (Kingma & Ba, 2015) to optimize the model. Word embeddings are initialized by Glo Ve (Pennington et al., 2014) and are not updated during training. We set all the words beyond Glove as zero vectors. We set l to 300, batch size to 30, learning rate to 0.002. We tune the dropout probability from 0 to 0.5 and the number of candidate answers for re-ranking (K) in [3, 5, 10]. |