reproducibilityindex.ai

Retrospective Reader for Machine Reading Comprehension

Authors: Zhuosheng Zhang, Junjie Yang, Hai Zhao14506-14514

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The proposed reader is evaluated on two benchmark MRC challenge datasets SQu AD2.0 and News QA, achieving new state-of-the-art results. Signiﬁcance tests show that our model is signiﬁcantly better than strong baselines.
Researcher Affiliation	Academia	Zhuosheng Zhang1,2,3, Junjie Yang2,3,4, Hai Zhao1,2,3, 1Department of Computer Science and Engineering, Shanghai Jiao Tong University 2Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, Shanghai, China 3Mo E Key Lab of Artiﬁcial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China 4SJTU-Paris Tech Elite Institute of Technology, Shanghai Jiao Tong University, Shanghai, China zhangzs@sjtu.edu.cn, jj-yang@sjtu.edu.cn, zhaohai@cs.sjtu.edu.cn
Pseudocode	No	The paper describes the model architecture and mathematical equations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our source code is available at https://github.com/cooelf/ Awesome MRC.
Open Datasets	Yes	Our proposed reader is evaluated in two benchmark MRC challenges. SQu AD2.0 As a widely used MRC benchmark dataset, SQu AD2.0 (Rajpurkar, Jia, and Liang 2018)... News QA (Trischler et al. 2017) is a question-answering dataset...
Dataset Splits	Yes	SQu AD2.0... The training dataset contains 87k answerable and 43k unanswerable questions. News QA... The training dataset has 20k unanswerable questions among 97k questions. Hyper-parameters were selected using the dev set.
Hardware Specification	No	The paper mentions using pre-trained language models like BERT, ALBERT, and ELECTRA and fine-tuning them, but it does not specify any hardware details like GPU models, CPU types, or memory.
Software Dependencies	No	The paper mentions using Pytorch and Tensorflow implementations and refers to Huggingface Transformers and Google Research Electra repositories but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	For the ﬁne-tuning in our tasks, we set the initial learning rate in {2e-5, 3e-5} with a warmup rate of 0.1, and L2 weight decay of 0.01. The batch size is selected in {32, 48}. The maximum number of epochs is set in 2 for all the experiments. Texts are tokenized using wordpieces (Wu et al. 2016), with a maximum length of 512. Hyper-parameters were selected using the dev set. The manual weights are α1 = α2 = β1 = β2 = 0.5 in this work.