reproducibilityindex.ai

From Pixels to Objects: Cubic Visual Attention for Visual Question Answering

Authors: Jingkuan Song, Pengpeng Zeng, Lianli Gao, Heng Tao Shen

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We assess the performance of our proposed CVA on three public image QA datasets, including COCO-QA, VQA and Visual7W. Experimental results show that our proposed method signiﬁcantly outperforms the state-of-the-arts.
Researcher Affiliation	Academia	Center for Future Media and School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
Pseudocode	No	The paper describes the model architecture and mathematical formulations, but it does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for their methodology is publicly available.
Open Datasets	Yes	We evaluate our proposed model on three public image QA datasets: the COCO-QA dataset [Ren et al., 2015], the VQA dataset (collected from the newly-released Microsoft Common Objects in Context (MS COCO) dataset), and Visual7W dataset (collected recently by Zhu et al [Zhu et al., 2016]).
Dataset Splits	Yes	For the VQA dataset, 204,721 real images (123,287 training and validation vs 81,434 testing ) are collected from the newly-released Microsoft Common Objects in Context (MS COCO) dataset. The paper also mentions 'test-dev' for debugging and validation purposes.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running its experiments.
Software Dependencies	No	The paper mentions software components such as 'Faster R-CNN', 'Res Net-101', 'Glo Ve word embedding', and 'Adam' optimizer, but it does not specify version numbers for any of these components.
Experiment Setup	Yes	The paper states: 'For extracting visual object features... select top 36 (k = 36) object regions and each region is represented as 2,048 dimensional features.' 'the dimension of every hidden layer including GRU, attention models and the ﬁnal joint feature embedding is set as 1,024.' 'our models are trained with Adam. The batch size is set to 256, and the epoch is set as 30. More speciﬁcally, gradient clipping technology and dropout are exploited in training.'