reproducibilityindex.ai

Robust Domain Adaptation for Machine Reading Comprehension

Authors: Liang Jiang, Zhenyu Huang, Jia Liu, Zujie Wen, Xi Peng

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on three datasets demonstrate the effectiveness of our method. Experiments In this section, we evaluate the RMRC on three datasets by comparing it with three MRC domain adaptation methods.
Researcher Affiliation	Collaboration	Liang Jiang 1, Zhenyu Huang 2, Jia Liu1, Zujie Wen1, Xi Peng2 1Ant Group 2College of Computer Science, Sichuan University
Pseudocode	No	The paper describes the method using prose and mathematical equations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement about releasing source code or a link to a code repository.
Open Datasets	Yes	We pretrain the MRC model on the SQUAD dataset (Rajpurkar et al. 2016) and ﬁne-tune it on three target datasets including two public datasets (Qu AC (Choi et al. 2018) and Co QA (Reddy, Chen, and Manning 2019)) and one realworld dataset from Alipay. Note that, as the Alipay data is in Chinese, we use another Chinese corpus instead of SQu AD for pre-training, i.e., a collection of CMRC (Cui et al. 2018), DRCD (Shao et al. 2018) and DUREADER (He et al. 2017).
Dataset Splits	No	The paper provides explicit sizes for training and testing sets for QuAC and CoQA (e.g., '11,567 training documents and 1,000 testing documents' for QuAC), but it does not explicitly state the size or percentage of a separate validation dataset split. Although it mentions 'The optimal parameters are determined by the grid search in the Alipay dataset', implying a validation process, a specific validation split is not quantified in the text.
Hardware Specification	No	The paper does not explicitly describe the specific hardware used to run the experiments, such as GPU models (e.g., NVIDIA A100), CPU models, or memory specifications. It only describes the BERT model architecture used.
Software Dependencies	No	The paper mentions software components like 'BERT' and 'Adam optimizer' but does not provide specific version numbers for any software, libraries, or frameworks used for implementation.
Experiment Setup	Yes	In our experiments, we take the widely-used BERT as the base encoder for the QS and the MRC encoder. The BERT network contains 12 hidden layers, each of which consists of 12 attention heads. The maximal input length and the hidden size are ﬁxed to 512 and 768, respectively. ... For all experiments, we generate n-grams for each document by setting K = 7 for Eq. 1 and set the threshold for answer ﬁltering γ to 0.7 and select the question by ﬁxing κ in Eq. 6 to 5. ... We set the baseline score of the reward rb to 0.7 for Eq. 14 in all experiments. For network training, we use the Adam optimizer whose learning rate is set to 2e 5 and 1e 5 for pre-training and ﬁne-tuning, respectively.