Read + Verify: Machine Reading Comprehension with Unanswerable Questions
Authors: Minghao Hu, Furu Wei, Yuxing Peng, Zhen Huang, Nan Yang, Dongsheng Li6529-6537
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on the SQu AD 2.0 dataset show that our system obtains a score of 74.2 F1 on test set, achieving state-of-the-art results at the time of submission (Aug. 28th, 2018). |
| Researcher Affiliation | Collaboration | 1College of Computer, National University of Defense Technology 2Microsoft Research Asia |
| Pseudocode | No | The paper includes architectural diagrams (Figure 2) but no formal pseudocode blocks or algorithms. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code or a link to a code repository for the methodology described. |
| Open Datasets | Yes | We evaluate our approach on the SQu AD 2.0 dataset (Rajpurkar, Jia, and Liang 2018). |
| Dataset Splits | Yes | We tune this threshold to maximize F1 score on the development set, and report both of EM (Exact Match) and F1 metrics. |
| Hardware Specification | No | The paper does not specify any hardware details such as GPU models, CPU types, or memory used for the experiments. |
| Software Dependencies | No | The paper mentions using 'the nltk tokenizer' but does not provide a specific version number for nltk or any other software dependency. |
| Experiment Setup | Yes | We run a grid search on γ and λ among [0.1, 0.3, 0.5, 0.7, 1, 2]. Based on the performance on development set, we set γ as 0.3 and λ to be 1. ... For Model-II, the Adam optimizer (Kingma and Ba 2014) with a learning rate of 0.0008 is used, the hidden size is set as 300, and a dropout (Srivastava et al. 2014) of 0.3 is applied for preventing overfitting. The batch size is 48 for the reader, 64 for Model-II, and 32 for Model-I as well as Model-III. |