Hierarchical Attention Flow for Multiple-Choice Reading Comprehension

Authors: Haichao Zhu, Furu Wei, Bing Qin, Ting Liu

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On a large-scale multiple-choice reading comprehension dataset (i.e. the RACE dataset), the proposed model outperforms two previous neural network baselines on both RACE-M and RACE-H subsets and yields the state-of-the-art overall results.
Researcher Affiliation Collaboration Haichao Zhu, Furu Wei, Bing Qin, Ting Liu SCIR, Harbin Institute of Technology, China Microsoft Research, Beijing, China {hczhu, qinb, tliu}@ir.hit.edu.cn, fuwei@microsoft.com
Pseudocode No No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code No The paper does not provide an explicit statement about releasing the source code for the described methodology or a link to a code repository.
Open Datasets Yes RACE (Lai et al. 2017) and MCTest (Richardson, Burges, and Renshaw 2013) are two representative benchmark datasets generated by human for multiple-choice reading comprehension. Large-scale Re Ading Comprehension Dataset From Examinations (RACE) is a multiple-choice reading comprehension dataset.
Dataset Splits Yes RACE contains 27,933 passages and 97,687 questions in total, 5% as development set and 5% as test set. Table 1 shows separation of the dataset.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions 'TensorFlow' and 'Natural Language Toolkit' (NLTK) but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes To train the model, we adopt stochastic gradient descent with ADAM optimizer (Kingma and Ba 2015), with initial learning rate 0.001. Gradients are clipped in L2-norm to no larger than 10. A mini-batch of 32 samples is used to update the model parameter per step. We keep 50,000 most frequent words in training set as vocabulary and add a special token UNK for out-of-vocabulary (OOV) words. The hidden state size of all GRU network is 128. We apply dropout(Srivastava et al. 2014) to word embeddings and Bi GRU s outputs with a drop rate of 0.4.