Unsupervised Domain Adaptation on Reading Comprehension

Authors: Yu Cao, Meng Fang, Baosheng Yu, Joey Tianyi Zhou7480-7487

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show our approach achieves comparable performance to supervised models on multiple large-scale benchmark datasets.
Researcher Affiliation Collaboration Yu Cao,1 Meng Fang,2 Baosheng Yu,1 Joey Tianyi Zhou3 1UBTECH Sydney AI Center, School of Computer Science, FEIT, The University of Sydney, Australia 2Department of Computer Science, University of Waikato, New Zealand 3Institute of High Performance Computing, A*STAR, Singapore
Pseudocode Yes Algorithm 1: CASe. Given a BERT feature network F, an output network G, and a discriminator D.
Open Source Code Yes Code available at: https://github.com/caoyu1991/CASe
Open Datasets Yes SQUAD (Rajpurkar et al. 2016) contains 87k training samples and 11k validation (dev) samples, with questions in natural language given by workers based on paragraphs from Wikipeida. ... CNN and DAILYMAIL (Hermann et al. 2015) contains 374k training and 4k dev samples...
Dataset Splits Yes SQUAD (Rajpurkar et al. 2016) contains 87k training samples and 11k validation (dev) samples... Table 1: Characterizations of datasets after processing. (lists Train and Dev sample counts)
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory amounts) were found in the paper. It only mentions using a 'BERT implementation in Py Torch' and a 'base-uncased pretrained model'.
Software Dependencies No The paper mentions 'Py Torch' and 'Adam optimizer' but does not specify version numbers for any software dependencies.
Experiment Setup Yes Adam optimizer (Kingma and Ba 2014) is employed with learning rate 3 10 5 in the source domain training, 2 10 5 in the self-training and 10 5 in the adversarial learning, with batch size 12. A dropout with rate 0.2 is applied on both the BERT feature network and the discriminator. We set the epoch number Npre = 3 in pre-training and Nda = 4 in domain adaptation. ... Generating probability threshold Tprob is set as 0.4 and nbest = 20.