Learning Conditioned Graph Structures for Interpretable Visual Question Answering

Authors: Will Norcliffe-Brown, Stathis Vafeias, Sarah Parisot

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test our approach on the VQA v2 dataset using a simple baseline architecture enhanced by the proposed graph learner module.
Researcher Affiliation Industry Will Norcliffe-Brown Aim Brain Ltd. will.norcliffe@aimbrain.com Efstathios Vafeias Aim Brain Ltd. stathis@aimbrain.com Sarah Parisot Aim Brain Ltd. sarah@aimbrain.com
Pseudocode No The paper describes the model architecture and components in detail, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code can be found at github.com/aimbrain/vqa-project.
Open Datasets Yes We evaluate our model using the VQA 2.0 dataset [20] which contains a total of 1,105,904 questions and about 204,721 images from the COCO dataset.
Dataset Splits Yes The dataset is split up roughly into proportions of 40%, 20%, 40% for train, validation and test sets respectively.
Hardware Specification No The paper does not specify the hardware used for experiments (e.g., GPU models, CPU types, or memory).
Software Dependencies No The paper mentions using a 'dynamic Gated Recurrent Unit (GRU) [17]' and 'Adam optimizer [22]' but does not specify software dependencies with version numbers (e.g., Python 3.x, TensorFlow 2.x, PyTorch 1.x).
Experiment Setup Yes Our question encoder is a dynamic Gated Recurrent Unit (GRU) [17] with a hidden state size of 1024 (dq = 1024). Our function F (see Eq. 1), which learns the adjacency matrix, comprises two dense linear layers of size 512 (dg = 512). We use L=2 spatial graph convolution layers of dimensions 2048 and 1024 so that (dh1 = 2048, dh2 = 1024). All dense layers and convolutional layers are activated using Rectified Linear Unit (Re LU) activation functions. During training we use dropout on the image features and all but the final dense layers nodes with a 0.5 probability. We train for 35 epochs using batch size of 64 and the Adam optimizer [22] with a learning rate of 0.0001 which we halve after the 30th epoch.