Hierarchical Attention Flow for Multiple-Choice Reading Comprehension
Authors: Haichao Zhu, Furu Wei, Bing Qin, Ting Liu
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On a large-scale multiple-choice reading comprehension dataset (i.e. the RACE dataset), the proposed model outperforms two previous neural network baselines on both RACE-M and RACE-H subsets and yields the state-of-the-art overall results. |
| Researcher Affiliation | Collaboration | Haichao Zhu, Furu Wei, Bing Qin, Ting Liu SCIR, Harbin Institute of Technology, China Microsoft Research, Beijing, China {hczhu, qinb, tliu}@ir.hit.edu.cn, fuwei@microsoft.com |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing the source code for the described methodology or a link to a code repository. |
| Open Datasets | Yes | RACE (Lai et al. 2017) and MCTest (Richardson, Burges, and Renshaw 2013) are two representative benchmark datasets generated by human for multiple-choice reading comprehension. Large-scale Re Ading Comprehension Dataset From Examinations (RACE) is a multiple-choice reading comprehension dataset. |
| Dataset Splits | Yes | RACE contains 27,933 passages and 97,687 questions in total, 5% as development set and 5% as test set. Table 1 shows separation of the dataset. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions 'TensorFlow' and 'Natural Language Toolkit' (NLTK) but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | To train the model, we adopt stochastic gradient descent with ADAM optimizer (Kingma and Ba 2015), with initial learning rate 0.001. Gradients are clipped in L2-norm to no larger than 10. A mini-batch of 32 samples is used to update the model parameter per step. We keep 50,000 most frequent words in training set as vocabulary and add a special token UNK for out-of-vocabulary (OOV) words. The hidden state size of all GRU network is 128. We apply dropout(Srivastava et al. 2014) to word embeddings and Bi GRU s outputs with a drop rate of 0.4. |