Employing External Rich Knowledge for Machine Comprehension
Authors: Bingning Wang, Shangmin Guo, Kang Liu, Shizhu He, Jun Zhao
IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we build an attention-based recurrent neural network model, train it with the help of external knowledge which is semantically relevant to machine comprehension, and achieves a new state-of-the-art result. |
| Researcher Affiliation | Academia | National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences {bingning.wang, shangmin.guo, kliu, shizhu.he, jzhao}@nlpr.ia.ac.cn |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing its source code or a link to a repository containing its implementation for the described methodology. |
| Open Datasets | Yes | Richardson [2013] introduced MCTest-a dataset of narrative stories with a set of questions. |
| Dataset Splits | Yes | They divide the dataset into two parts, namely MC160 and MC500 which contains 160 and 500 stories respectively, and each part is divided into training, development and test sets. |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'word2vec' but does not specify a version number for it or any other key software components or libraries required for reproducibility. |
| Experiment Setup | Yes | LSTM hidden state is activated by tanh and hidden vector length is 178, it has been proved by Pascanu [2013] that the vanishing and exploding gradient issues in RNN depends on the largest singular value, so we initiate all hidden-to-hidden weight matrix in LSTM by fixing its largest singular value to 1. For regularization, we add L2 penalty with a coefficient of 10-5. Dropout [Srivastava et al., 2014] is further applied to both weights and embeddings. All hidden layers are dropped out by 30%, and embeddings 40%. The max-margin M is set to 0.12 by developmental set behavior. |