Multi-Matching Network for Multiple Choice Reading Comprehension
Authors: Min Tang, Jiaran Cai, Hankz Hankui Zhuo7088-7095
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To demonstrate the effectiveness of our model, we evaluate MMN on a large-scale multiple choice machine reading comprehension dataset (i.e. RACE). Empirical results show that our proposed model achieves a significant improvement compared to strong baselines and obtains state-of-the-art results. |
| Researcher Affiliation | Academia | Min Tang, Jiaran Cai, Hankz Hankui Zhuo School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China. {tangm28, caijr5}@mail2.sysu.edu.cn, zhuohank@mail.sysu.edu.cn |
| Pseudocode | No | The paper does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block/figure. |
| Open Source Code | No | The paper mentions using open-source tools (SpaCy, TensorFlow, GloVe) but does not provide a link or statement for the public release of the authors' own MMN source code. |
| Open Datasets | Yes | To evaluate the effectiveness of our model, we conduct experiments on RACE (Lai et al. 2017) which is a large-scale multiple choice reading comprehension dataset. |
| Dataset Splits | Yes | We partition the train/dev/test sets in the same way as ((Lai et al. 2017)) does and use accuracy as the evaluation metric. The statistics of RACE dataset are shown in Table 2. |
| Hardware Specification | Yes | All experiments were conducted on a NVIDIA TITAN XP GPU Card. |
| Software Dependencies | No | The paper mentions using SpaCy, TensorFlow, and GloVe, but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | We use 300D Glo Ve word embeddings which remain fixed during training. Each Bi GRU holds 1 layer and 100 hidden units for each direction. To alleviate overfitting, we apply dropout (Srivastava et al. 2014) to the input of every layer with the dropout rate set to 0.2. The model is updated using mini-batch stochastic gradient descent with batch size of 32. We train our model using ADAM (Kingma and Ba 2014) with learning rate of 0.0003, where gradients are clipped in L2-norm to no larger than 10. Regularization coefficient is set to 1e-7. Early stopping technique is adopted after 50 epochs. |