Dynamic Memory Networks for Visual and Textual Question Answering

Authors: Caiming Xiong, Stephen Merity, Richard Socher

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our new DMN+ model improves the state of the art on both the Visual Question Answering dataset and the b Ab I-10k text question-answering dataset without supporting fact supervision.
Researcher Affiliation Industry Caiming Xiong*, Stephen Merity*, Richard Socher {CXIONG,SMERITY,RSOCHER}SALESFORCE.COM Salesforce Inc., CA USA
Pseudocode Yes For each time step i with input xi and previous hidden state hi 1, we compute the updated hidden state hi = GRU(xi, hi 1) by ui = σ W (u)xi + U (u)hi 1 + b(u) ri = σ W (r)xi + U (r)hi 1 + b(r) hi = tanh Wxi + ri Uhi 1 + b(h) hi = ui hi + (1 ui) hi 1 (4)
Open Source Code No The paper does not provide any concrete access to source code through a specific repository link, an explicit code release statement, or mention of code in supplementary materials.
Open Datasets Yes For evaluating the DMN on textual question answering, we use b Ab I-10k English (Weston et al., 2015a; Sukhbaatar et al., 2015), a synthetic dataset which features 20 different tasks. ... The Visual Question Answering (VQA) dataset was constructed using the Microsoft COCO dataset (Lin et al., 2014)...
Dataset Splits Yes This dataset contains 248,349 training questions, 121,512 validation questions, and 244,302 for testing. ... The last 10% of the training data on each task was chosen as the validation set.
Hardware Specification No The paper mentions using a 'convolutional neural network (Krizhevsky et al., 2012) based upon the VGG-19 model (Simonyan & Zisserman, 2014)', which implies GPU usage, but it does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper mentions software components like 'Adam optimizer (Kingma & Ba, 2014)' and 'VGG-19 model (Simonyan & Zisserman, 2014)' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes We trained our models using the Adam optimizer (Kingma & Ba, 2014) with a learning rate of 0.001 and batch size of 128. Training runs for up to 256 epochs with early stopping if the validation loss had not improved within the last 20 epochs. The model from the epoch with the lowest validation loss was then selected. Xavier initialization was used for all weights except for the word embeddings, which used random uniform initialization with range [-0.1, 0.1]. Both the embedding and hidden dimensions were of size d = 80. We used ℓ2 regularization on all weights except bias and used dropout on the initial sentence encodings and the answer module, keeping the input with probability p = 0.9. ... For the VQA dataset, ... learning rate of 0.003 and batch size of 100. ... dropout on the initial image output from the VGG convolutional neural network ... keeping input with probability p = 0.5.