reproducibilityindex.ai

A Robust Adversarial Training Approach to Machine Reading Comprehension

Authors: Kai Liu, Xin Liu, An Yang, Jing Liu, Jinsong Su, Sujian Li, Qiaoqiao She8392-8400

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	When applied to the state-of-the-art MRC models, including QANET, BERT and ERNIE2.0, our approach obtains signiﬁcant and comprehensive improvements on 5 adversarial datasets constructed in different ways, without sacriﬁcing the performance on the original SQu AD development set. Sections like "Experiments", "Results and Discussions" also indicate empirical studies.
Researcher Affiliation	Collaboration	Kai Liu,1 Xin Liu,2 An Yang,3 Jing Liu,1 Jinsong Su,2 Sujian Li,3 Qiaoqiao She1 1Baidu Inc., Beijing, China 2Xiamen University, Xiamen, China 3Key Laboratory of Computational Linguistics, Peking University, MOE, China
Pseudocode	Yes	Algorithm 1: Adversarial Training Strategy
Open Source Code	No	The paper does not include any explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets	Yes	SQu AD (Rajpurkar et al. 2016): One of the most popular MRC datasets. The dataset consists of 87.5K question-answer training pairs... And we select the SQu AD v1.0 as the training dataset. Add Sent Diverse (ASD) (Wang and Bansal 2018): Based on the observation of Add Sent (Jia and Liang 2017), they enriched SQu AD training data with correspondingly designed adversarial examples.
Dataset Splits	Yes	SQu AD (DEV) (Rajpurkar et al. 2016): The development set of SQu AD v1.0 in which contains 10K triple q, p, sg instances for evaluation. We test the models on six different test sets, i.e. standard SQu AD development set and ﬁve different types of adversarial test sets.
Hardware Specification	No	The paper does not explicitly describe the specific hardware used (e.g., GPU models, CPU types, memory) to run its experiments. It only mentions batch sizes which are influenced by hardware but not the hardware itself.
Software Dependencies	No	The paper mentions using specific MRC models (QANet, BERT, ERNIE2.0) but does not provide specific version numbers for any programming languages, libraries, or frameworks used in the implementation (e.g., Python version, PyTorch/TensorFlow version, CUDA version).
Experiment Setup	Yes	In perturbation embedding training phase, we randomly insert perturbation embedding between sentences, and have embeddings randomly initialized. During the embedding training, we set the batch size of QANet to be 32, BERTbase 12, BERTlarge/ERNIE 2.0 to be 4. We limit the perturbation sequence length l to be 10. For each batch, we randomly set λq, λp to be -10 or 10, and set λc to be 0.5. And we set sd with random length in the middle of each perturbation embedding. To determine the convergence of the embedding training process, we set the threshold as 1.5 and we set the maximum training step as 200 because the most training losses tend to be stable (differences are lower than 1e-3) around 200 steps. In training iteration, we set maximum training time T to be 5, trainloss s stopping threshold ϵ to be 12.0.