Retrospective Reader for Machine Reading Comprehension
Authors: Zhuosheng Zhang, Junjie Yang, Hai Zhao14506-14514
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The proposed reader is evaluated on two benchmark MRC challenge datasets SQu AD2.0 and News QA, achieving new state-of-the-art results. Significance tests show that our model is significantly better than strong baselines. |
| Researcher Affiliation | Academia | Zhuosheng Zhang1,2,3, Junjie Yang2,3,4, Hai Zhao1,2,3, 1Department of Computer Science and Engineering, Shanghai Jiao Tong University 2Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, Shanghai, China 3Mo E Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China 4SJTU-Paris Tech Elite Institute of Technology, Shanghai Jiao Tong University, Shanghai, China zhangzs@sjtu.edu.cn, jj-yang@sjtu.edu.cn, zhaohai@cs.sjtu.edu.cn |
| Pseudocode | No | The paper describes the model architecture and mathematical equations but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our source code is available at https://github.com/cooelf/ Awesome MRC. |
| Open Datasets | Yes | Our proposed reader is evaluated in two benchmark MRC challenges. SQu AD2.0 As a widely used MRC benchmark dataset, SQu AD2.0 (Rajpurkar, Jia, and Liang 2018)... News QA (Trischler et al. 2017) is a question-answering dataset... |
| Dataset Splits | Yes | SQu AD2.0... The training dataset contains 87k answerable and 43k unanswerable questions. News QA... The training dataset has 20k unanswerable questions among 97k questions. Hyper-parameters were selected using the dev set. |
| Hardware Specification | No | The paper mentions using pre-trained language models like BERT, ALBERT, and ELECTRA and fine-tuning them, but it does not specify any hardware details like GPU models, CPU types, or memory. |
| Software Dependencies | No | The paper mentions using Pytorch and Tensorflow implementations and refers to Huggingface Transformers and Google Research Electra repositories but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | For the fine-tuning in our tasks, we set the initial learning rate in {2e-5, 3e-5} with a warmup rate of 0.1, and L2 weight decay of 0.01. The batch size is selected in {32, 48}. The maximum number of epochs is set in 2 for all the experiments. Texts are tokenized using wordpieces (Wu et al. 2016), with a maximum length of 512. Hyper-parameters were selected using the dev set. The manual weights are α1 = α2 = β1 = β2 = 0.5 in this work. |