Evidence Aggregation for Answer Re-Ranking in Open-Domain Question Answering

Authors: Shuohang Wang, Mo Yu, Jing Jiang, Wei Zhang, Xiaoxiao Guo, Shiyu Chang, Zhiguo Wang, Tim Klinger, Gerald Tesauro, Murray Campbell

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our models have achieved state-of-the-art results on three public open-domain QA datasets: Quasar-T, Search QA and the open-domain version of Trivia QA, with about 8 percentage points of improvement over the former two datasets.
Researcher Affiliation Collaboration 1School of Information System, Singapore Management University 2AI Foundations Learning, IBM Research AI
Pseudocode No The paper describes its methods through prose and mathematical equations but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes Our code will be released under https://github.com/shuohangwang/mprc.
Open Datasets Yes We conduct experiments on three publicly available open-domain QA datasets, namely, Quasar T (Dhingra et al., 2017b), Search QA (Dunn et al., 2017) and Trivia QA (Joshi et al., 2017). ... Quasar-T 5 (Dhingra et al., 2017b) 5https://github.com/bdhingra/quasar ... Search QA 6 (Dunn et al., 2017) 6https://github.com/nyu-dl/Search QA ... Trivia QA (Open-Domain Setting) 7 (Joshi et al., 2017) 7http://nlp.cs.washington.edu/triviaqa/data/triviaqa-unfiltered.tar.gz
Dataset Splits Yes The statistics of the three datasets are shown in Table 1. ... Table 1: Statistics of the datasets. #q(train) #q(dev) #q(test) ... Quasar-T 28,496 3,000 3,000
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running the experiments.
Software Dependencies No The paper mentions software like GloVe for word embeddings and Adam for optimization, but does not specify version numbers for these or any other key software dependencies (e.g., Python, deep learning frameworks).
Experiment Setup Yes For the coverage-based re-ranker, we use Adam (Kingma & Ba, 2015) to optimize the model. Word embeddings are initialized by Glo Ve (Pennington et al., 2014) and are not updated during training. We set all the words beyond Glove as zero vectors. We set l to 300, batch size to 30, learning rate to 0.002. We tune the dropout probability from 0 to 0.5 and the number of candidate answers for re-ranking (K) in [3, 5, 10].