Dynamic Coattention Networks For Question Answering
Authors: Caiming Xiong, Victor Zhong, Richard Socher
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On the Stanford question answering dataset, a single DCN model improves the previous state of the art from 71.0% F1 to 75.9%, while a DCN ensemble obtains 80.4% F1. |
| Researcher Affiliation | Industry | Caiming Xiong , Victor Zhong , Richard Socher Salesforce Research Palo Alto, CA 94301, USA {cxiong, vzhong, rsocher}@salesforce.com |
| Pseudocode | No | The paper contains architectural diagrams and mathematical equations but no structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states 'All models are implemented and trained with Chainer (Tokui et al., 2015).' but does not provide a link to the authors' own implementation of the DCN or explicitly state that their code is open-source or publicly available. |
| Open Datasets | Yes | Recently, Rajpurkar et al. (2016) released the Stanford Question Answering dataset (SQu AD) |
| Dataset Splits | Yes | The official SQu AD evaluation is hosted on Coda Lab 2. The training and development sets are publicly available while the test set is withheld. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU or CPU models, processor types, or memory used for running its experiments. |
| Software Dependencies | No | The paper mentions 'We use the tokenizer from Stanford Core NLP (Manning et al., 2014). All models are implemented and trained with Chainer (Tokui et al., 2015).' but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | We use a max sequence length of 600 during training and a hidden state size of 200 for all recurrent units, maxout layers, and linear layers. All LSTMs have randomly initialized parameters and an initial state of zero. Sentinel vectors are randomly initialized and optimized during training. For the dynamic decoder, we set the maximum number of iterations to 4 and use a maxout pool size of 16. We use dropout to regularize our network during training (Srivastava et al., 2014), and optimize the model using ADAM (Kingma & Ba, 2014). |