A Question-Focused Multi-Factor Attention Network for Question Answering
Authors: Souvik Kundu, Hwee Tou Ng
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our proposed model achieves significant improvements over the best prior state-of-the-art results on three large-scale challenging QA datasets, namely News QA, Trivia QA, and Search QA. Experiments We evaluated AMANDA on three challenging QA datasets: News QA, Trivia QA, and Search QA. Table 5 shows that AMANDA performs better than any of the ablated models which include the ablation of multi-factor attentive encoding, max-attentional question aggregation (qma), and question type representation (qf). |
| Researcher Affiliation | Academia | Souvik Kundu, Hwee Tou Ng Department of Computer Science National University of Singapore {souvik, nght}@comp.nus.edu.sg |
| Pseudocode | No | The paper describes the model's architecture and mathematical formulations but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/nusnlp/amanda |
| Open Datasets | Yes | We evaluated AMANDA on three challenging QA datasets: News QA, Trivia QA, and Search QA. (Trischler et al. 2016), (Joshi et al. 2017), (Dunn et al. 2017) |
| Dataset Splits | Yes | We evaluated AMANDA on three challenging QA datasets: News QA, Trivia QA, and Search QA. Using the News QA development set as a benchmark, we perform rigorous analysis for better understanding of how our proposed model works. Table 2: Results on the News QA dataset. Model Dev Test EM F1 EM F1 |
| Hardware Specification | No | The paper does not explicitly specify any hardware details such as GPU models, CPU types, or memory used for experiments. |
| Software Dependencies | No | The paper mentions software like NLTK (http://www.nltk.org/), GloVe (Pennington, Socher, and Manning 2014), and Adam optimizer (Kingma and Ba 2015), but it does not provide specific version numbers for these libraries or the underlying deep learning framework (e.g., PyTorch, TensorFlow). |
| Experiment Setup | Yes | We use the 300-dimension pre-trained word vectors from Glo Ve (Pennington, Socher, and Manning 2014) and we do not update them during training. The out-of-vocabulary words are initialized with zero vectors. We use 50-dimension character-level embedding vectors. The number of hidden units in all the LSTMs is 150. We use dropout (Srivastava et al. 2014) with probability 0.3 for every learnable layer. For multi-factor attentive encoding, we choose 4 factors (m) based on our experimental findings (refer to Table 7). During training, the minibatch size is fixed at 60. We use the Adam optimizer (Kingma and Ba 2015) with learning rate of 0.001 and clipnorm of 5. |