reproducibilityindex.ai

Hierarchical Attention Flow for Multiple-Choice Reading Comprehension

Authors: Haichao Zhu, Furu Wei, Bing Qin, Ting Liu

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On a large-scale multiple-choice reading comprehension dataset (i.e. the RACE dataset), the proposed model outperforms two previous neural network baselines on both RACE-M and RACE-H subsets and yields the state-of-the-art overall results.
Researcher Affiliation	Collaboration	Haichao Zhu, Furu Wei, Bing Qin, Ting Liu SCIR, Harbin Institute of Technology, China Microsoft Research, Beijing, China {hczhu, qinb, tliu}@ir.hit.edu.cn, fuwei@microsoft.com
Pseudocode	No	No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code	No	The paper does not provide an explicit statement about releasing the source code for the described methodology or a link to a code repository.
Open Datasets	Yes	RACE (Lai et al. 2017) and MCTest (Richardson, Burges, and Renshaw 2013) are two representative benchmark datasets generated by human for multiple-choice reading comprehension. Large-scale Re Ading Comprehension Dataset From Examinations (RACE) is a multiple-choice reading comprehension dataset.
Dataset Splits	Yes	RACE contains 27,933 passages and 97,687 questions in total, 5% as development set and 5% as test set. Table 1 shows separation of the dataset.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions 'TensorFlow' and 'Natural Language Toolkit' (NLTK) but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	To train the model, we adopt stochastic gradient descent with ADAM optimizer (Kingma and Ba 2015), with initial learning rate 0.001. Gradients are clipped in L2-norm to no larger than 10. A mini-batch of 32 samples is used to update the model parameter per step. We keep 50,000 most frequent words in training set as vocabulary and add a special token UNK for out-of-vocabulary (OOV) words. The hidden state size of all GRU network is 128. We apply dropout(Srivastava et al. 2014) to word embeddings and Bi GRU s outputs with a drop rate of 0.4.