A Question-Focused Multi-Factor Attention Network for Question Answering

Authors: Souvik Kundu, Hwee Tou Ng

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our proposed model achieves significant improvements over the best prior state-of-the-art results on three large-scale challenging QA datasets, namely News QA, Trivia QA, and Search QA. Experiments We evaluated AMANDA on three challenging QA datasets: News QA, Trivia QA, and Search QA. Table 5 shows that AMANDA performs better than any of the ablated models which include the ablation of multi-factor attentive encoding, max-attentional question aggregation (qma), and question type representation (qf).
Researcher Affiliation Academia Souvik Kundu, Hwee Tou Ng Department of Computer Science National University of Singapore {souvik, nght}@comp.nus.edu.sg
Pseudocode No The paper describes the model's architecture and mathematical formulations but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at https://github.com/nusnlp/amanda
Open Datasets Yes We evaluated AMANDA on three challenging QA datasets: News QA, Trivia QA, and Search QA. (Trischler et al. 2016), (Joshi et al. 2017), (Dunn et al. 2017)
Dataset Splits Yes We evaluated AMANDA on three challenging QA datasets: News QA, Trivia QA, and Search QA. Using the News QA development set as a benchmark, we perform rigorous analysis for better understanding of how our proposed model works. Table 2: Results on the News QA dataset. Model Dev Test EM F1 EM F1
Hardware Specification No The paper does not explicitly specify any hardware details such as GPU models, CPU types, or memory used for experiments.
Software Dependencies No The paper mentions software like NLTK (http://www.nltk.org/), GloVe (Pennington, Socher, and Manning 2014), and Adam optimizer (Kingma and Ba 2015), but it does not provide specific version numbers for these libraries or the underlying deep learning framework (e.g., PyTorch, TensorFlow).
Experiment Setup Yes We use the 300-dimension pre-trained word vectors from Glo Ve (Pennington, Socher, and Manning 2014) and we do not update them during training. The out-of-vocabulary words are initialized with zero vectors. We use 50-dimension character-level embedding vectors. The number of hidden units in all the LSTMs is 150. We use dropout (Srivastava et al. 2014) with probability 0.3 for every learnable layer. For multi-factor attentive encoding, we choose 4 factors (m) based on our experimental findings (refer to Table 7). During training, the minibatch size is fixed at 60. We use the Adam optimizer (Kingma and Ba 2015) with learning rate of 0.001 and clipnorm of 5.