Dynamic Coattention Networks For Question Answering

Authors: Caiming Xiong, Victor Zhong, Richard Socher

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On the Stanford question answering dataset, a single DCN model improves the previous state of the art from 71.0% F1 to 75.9%, while a DCN ensemble obtains 80.4% F1.
Researcher Affiliation Industry Caiming Xiong , Victor Zhong , Richard Socher Salesforce Research Palo Alto, CA 94301, USA {cxiong, vzhong, rsocher}@salesforce.com
Pseudocode No The paper contains architectural diagrams and mathematical equations but no structured pseudocode or algorithm blocks.
Open Source Code No The paper states 'All models are implemented and trained with Chainer (Tokui et al., 2015).' but does not provide a link to the authors' own implementation of the DCN or explicitly state that their code is open-source or publicly available.
Open Datasets Yes Recently, Rajpurkar et al. (2016) released the Stanford Question Answering dataset (SQu AD)
Dataset Splits Yes The official SQu AD evaluation is hosted on Coda Lab 2. The training and development sets are publicly available while the test set is withheld.
Hardware Specification No The paper does not provide any specific hardware details such as GPU or CPU models, processor types, or memory used for running its experiments.
Software Dependencies No The paper mentions 'We use the tokenizer from Stanford Core NLP (Manning et al., 2014). All models are implemented and trained with Chainer (Tokui et al., 2015).' but does not provide specific version numbers for these software components.
Experiment Setup Yes We use a max sequence length of 600 during training and a hidden state size of 200 for all recurrent units, maxout layers, and linear layers. All LSTMs have randomly initialized parameters and an initial state of zero. Sentinel vectors are randomly initialized and optimized during training. For the dynamic decoder, we set the maximum number of iterations to 4 and use a maxout pool size of 16. We use dropout to regularize our network during training (Srivastava et al., 2014), and optimize the model using ADAM (Kingma & Ba, 2014).