reproducibilityindex.ai

Read + Verify: Machine Reading Comprehension with Unanswerable Questions

Authors: Minghao Hu, Furu Wei, Yuxing Peng, Zhen Huang, Nan Yang, Dongsheng Li6529-6537

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments on the SQu AD 2.0 dataset show that our system obtains a score of 74.2 F1 on test set, achieving state-of-the-art results at the time of submission (Aug. 28th, 2018).
Researcher Affiliation	Collaboration	1College of Computer, National University of Defense Technology 2Microsoft Research Asia
Pseudocode	No	The paper includes architectural diagrams (Figure 2) but no formal pseudocode blocks or algorithms.
Open Source Code	No	The paper does not provide an explicit statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets	Yes	We evaluate our approach on the SQu AD 2.0 dataset (Rajpurkar, Jia, and Liang 2018).
Dataset Splits	Yes	We tune this threshold to maximize F1 score on the development set, and report both of EM (Exact Match) and F1 metrics.
Hardware Specification	No	The paper does not specify any hardware details such as GPU models, CPU types, or memory used for the experiments.
Software Dependencies	No	The paper mentions using 'the nltk tokenizer' but does not provide a specific version number for nltk or any other software dependency.
Experiment Setup	Yes	We run a grid search on γ and λ among [0.1, 0.3, 0.5, 0.7, 1, 2]. Based on the performance on development set, we set γ as 0.3 and λ to be 1. ... For Model-II, the Adam optimizer (Kingma and Ba 2014) with a learning rate of 0.0008 is used, the hidden size is set as 300, and a dropout (Srivastava et al. 2014) of 0.3 is applied for preventing overﬁtting. The batch size is 48 for the reader, 64 for Model-II, and 32 for Model-I as well as Model-III.