Multi-Matching Network for Multiple Choice Reading Comprehension

Authors: Min Tang, Jiaran Cai, Hankz Hankui Zhuo7088-7095

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To demonstrate the effectiveness of our model, we evaluate MMN on a large-scale multiple choice machine reading comprehension dataset (i.e. RACE). Empirical results show that our proposed model achieves a significant improvement compared to strong baselines and obtains state-of-the-art results.
Researcher Affiliation Academia Min Tang, Jiaran Cai, Hankz Hankui Zhuo School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China. {tangm28, caijr5}@mail2.sysu.edu.cn, zhuohank@mail.sysu.edu.cn
Pseudocode No The paper does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block/figure.
Open Source Code No The paper mentions using open-source tools (SpaCy, TensorFlow, GloVe) but does not provide a link or statement for the public release of the authors' own MMN source code.
Open Datasets Yes To evaluate the effectiveness of our model, we conduct experiments on RACE (Lai et al. 2017) which is a large-scale multiple choice reading comprehension dataset.
Dataset Splits Yes We partition the train/dev/test sets in the same way as ((Lai et al. 2017)) does and use accuracy as the evaluation metric. The statistics of RACE dataset are shown in Table 2.
Hardware Specification Yes All experiments were conducted on a NVIDIA TITAN XP GPU Card.
Software Dependencies No The paper mentions using SpaCy, TensorFlow, and GloVe, but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes We use 300D Glo Ve word embeddings which remain fixed during training. Each Bi GRU holds 1 layer and 100 hidden units for each direction. To alleviate overfitting, we apply dropout (Srivastava et al. 2014) to the input of every layer with the dropout rate set to 0.2. The model is updated using mini-batch stochastic gradient descent with batch size of 32. We train our model using ADAM (Kingma and Ba 2014) with learning rate of 0.0003, where gradients are clipped in L2-norm to no larger than 10. Regularization coefficient is set to 1e-7. Early stopping technique is adopted after 50 epochs.