reproducibilityindex.ai

Learning Conditioned Graph Structures for Interpretable Visual Question Answering

Authors: Will Norcliffe-Brown, Stathis Vafeias, Sarah Parisot

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We test our approach on the VQA v2 dataset using a simple baseline architecture enhanced by the proposed graph learner module.
Researcher Affiliation	Industry	Will Norcliffe-Brown Aim Brain Ltd. will.norcliffe@aimbrain.com Efstathios Vafeias Aim Brain Ltd. stathis@aimbrain.com Sarah Parisot Aim Brain Ltd. sarah@aimbrain.com
Pseudocode	No	The paper describes the model architecture and components in detail, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code can be found at github.com/aimbrain/vqa-project.
Open Datasets	Yes	We evaluate our model using the VQA 2.0 dataset [20] which contains a total of 1,105,904 questions and about 204,721 images from the COCO dataset.
Dataset Splits	Yes	The dataset is split up roughly into proportions of 40%, 20%, 40% for train, validation and test sets respectively.
Hardware Specification	No	The paper does not specify the hardware used for experiments (e.g., GPU models, CPU types, or memory).
Software Dependencies	No	The paper mentions using a 'dynamic Gated Recurrent Unit (GRU) [17]' and 'Adam optimizer [22]' but does not specify software dependencies with version numbers (e.g., Python 3.x, TensorFlow 2.x, PyTorch 1.x).
Experiment Setup	Yes	Our question encoder is a dynamic Gated Recurrent Unit (GRU) [17] with a hidden state size of 1024 (dq = 1024). Our function F (see Eq. 1), which learns the adjacency matrix, comprises two dense linear layers of size 512 (dg = 512). We use L=2 spatial graph convolution layers of dimensions 2048 and 1024 so that (dh1 = 2048, dh2 = 1024). All dense layers and convolutional layers are activated using Rectiﬁed Linear Unit (Re LU) activation functions. During training we use dropout on the image features and all but the ﬁnal dense layers nodes with a 0.5 probability. We train for 35 epochs using batch size of 64 and the Adam optimizer [22] with a learning rate of 0.0001 which we halve after the 30th epoch.