Employing External Rich Knowledge for Machine Comprehension

Authors: Bingning Wang, Shangmin Guo, Kang Liu, Shizhu He, Jun Zhao

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we build an attention-based recurrent neural network model, train it with the help of external knowledge which is semantically relevant to machine comprehension, and achieves a new state-of-the-art result.
Researcher Affiliation Academia National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences {bingning.wang, shangmin.guo, kliu, shizhu.he, jzhao}@nlpr.ia.ac.cn
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not provide any explicit statements about releasing its source code or a link to a repository containing its implementation for the described methodology.
Open Datasets Yes Richardson [2013] introduced MCTest-a dataset of narrative stories with a set of questions.
Dataset Splits Yes They divide the dataset into two parts, namely MC160 and MC500 which contains 160 and 500 stories respectively, and each part is divided into training, development and test sets.
Hardware Specification No The paper does not provide any specific details regarding the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper mentions using 'word2vec' but does not specify a version number for it or any other key software components or libraries required for reproducibility.
Experiment Setup Yes LSTM hidden state is activated by tanh and hidden vector length is 178, it has been proved by Pascanu [2013] that the vanishing and exploding gradient issues in RNN depends on the largest singular value, so we initiate all hidden-to-hidden weight matrix in LSTM by fixing its largest singular value to 1. For regularization, we add L2 penalty with a coefficient of 10-5. Dropout [Srivastava et al., 2014] is further applied to both weights and embeddings. All hidden layers are dropped out by 30%, and embeddings 40%. The max-margin M is set to 0.12 by developmental set behavior.