Robust Domain Adaptation for Machine Reading Comprehension
Authors: Liang Jiang, Zhenyu Huang, Jia Liu, Zujie Wen, Xi Peng
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on three datasets demonstrate the effectiveness of our method. Experiments In this section, we evaluate the RMRC on three datasets by comparing it with three MRC domain adaptation methods. |
| Researcher Affiliation | Collaboration | Liang Jiang 1, Zhenyu Huang 2, Jia Liu1, Zujie Wen1, Xi Peng2 1Ant Group 2College of Computer Science, Sichuan University |
| Pseudocode | No | The paper describes the method using prose and mathematical equations but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | We pretrain the MRC model on the SQUAD dataset (Rajpurkar et al. 2016) and fine-tune it on three target datasets including two public datasets (Qu AC (Choi et al. 2018) and Co QA (Reddy, Chen, and Manning 2019)) and one realworld dataset from Alipay. Note that, as the Alipay data is in Chinese, we use another Chinese corpus instead of SQu AD for pre-training, i.e., a collection of CMRC (Cui et al. 2018), DRCD (Shao et al. 2018) and DUREADER (He et al. 2017). |
| Dataset Splits | No | The paper provides explicit sizes for training and testing sets for QuAC and CoQA (e.g., '11,567 training documents and 1,000 testing documents' for QuAC), but it does not explicitly state the size or percentage of a separate validation dataset split. Although it mentions 'The optimal parameters are determined by the grid search in the Alipay dataset', implying a validation process, a specific validation split is not quantified in the text. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used to run the experiments, such as GPU models (e.g., NVIDIA A100), CPU models, or memory specifications. It only describes the BERT model architecture used. |
| Software Dependencies | No | The paper mentions software components like 'BERT' and 'Adam optimizer' but does not provide specific version numbers for any software, libraries, or frameworks used for implementation. |
| Experiment Setup | Yes | In our experiments, we take the widely-used BERT as the base encoder for the QS and the MRC encoder. The BERT network contains 12 hidden layers, each of which consists of 12 attention heads. The maximal input length and the hidden size are fixed to 512 and 768, respectively. ... For all experiments, we generate n-grams for each document by setting K = 7 for Eq. 1 and set the threshold for answer filtering γ to 0.7 and select the question by fixing κ in Eq. 6 to 5. ... We set the baseline score of the reward rb to 0.7 for Eq. 14 in all experiments. For network training, we use the Adam optimizer whose learning rate is set to 2e 5 and 1e 5 for pre-training and fine-tuning, respectively. |