Read + Verify: Machine Reading Comprehension with Unanswerable Questions

Authors: Minghao Hu, Furu Wei, Yuxing Peng, Zhen Huang, Nan Yang, Dongsheng Li6529-6537

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on the SQu AD 2.0 dataset show that our system obtains a score of 74.2 F1 on test set, achieving state-of-the-art results at the time of submission (Aug. 28th, 2018).
Researcher Affiliation Collaboration 1College of Computer, National University of Defense Technology 2Microsoft Research Asia
Pseudocode No The paper includes architectural diagrams (Figure 2) but no formal pseudocode blocks or algorithms.
Open Source Code No The paper does not provide an explicit statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets Yes We evaluate our approach on the SQu AD 2.0 dataset (Rajpurkar, Jia, and Liang 2018).
Dataset Splits Yes We tune this threshold to maximize F1 score on the development set, and report both of EM (Exact Match) and F1 metrics.
Hardware Specification No The paper does not specify any hardware details such as GPU models, CPU types, or memory used for the experiments.
Software Dependencies No The paper mentions using 'the nltk tokenizer' but does not provide a specific version number for nltk or any other software dependency.
Experiment Setup Yes We run a grid search on γ and λ among [0.1, 0.3, 0.5, 0.7, 1, 2]. Based on the performance on development set, we set γ as 0.3 and λ to be 1. ... For Model-II, the Adam optimizer (Kingma and Ba 2014) with a learning rate of 0.0008 is used, the hidden size is set as 300, and a dropout (Srivastava et al. 2014) of 0.3 is applied for preventing overfitting. The batch size is 48 for the reader, 64 for Model-II, and 32 for Model-I as well as Model-III.