reproducibilityindex.ai

Chain of Reasoning for Visual Question Answering

Authors: Chenfei Wu, Jinlai Liu, Xiaojie Wang, Xuan Dong

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We achieve new state-of-the-art results on four publicly available datasets. We conduct a detailed ablation study to show that our proposed chain structure is superior to stack structure and parallel structure.
Researcher Affiliation	Academia	Chenfei Wu , Jinlai Liu , Xiaojie Wang, Xuan Dong Center for Intelligence Science and Technology Beijing University of Posts and Telecommunications {wuchenfei,liujinlai, xjwang, dongxuan8811}@bupt.edu.cn
Pseudocode	No	The paper describes the model architecture and operations using mathematical equations (Eq. 1-13) and textual explanations, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	More details, including source codes, will be published in the near future.
Open Datasets	Yes	We evaluate our model on four public datasets: the VQA 1.0 dataset [29], the VQA 2.0 dataset [30], the COCO-QA dataset[31] and the TDIUC dataset [27].
Dataset Splits	Yes	For a fair comparion, all the data provided in this section are trained under the VQA 2.0 training set and tested on the VQA 2.0 validation set.
Hardware Specification	No	The paper mentions implementing the model using Pytorch but does not specify any hardware details like GPU models, CPU types, or cloud computing environments used for running the experiments.
Software Dependencies	No	We implement the model using Pytorch. We use Adam[35] to train the model. While software names like Pytorch and Adam are mentioned, no specific version numbers are provided for these or other dependencies.
Experiment Setup	Yes	During the data-embedding phase, the image features are mapped to the size of 36 2048 and the text features are mapped to the size of 2400. In the chain of reasoning phase, the number of hidden layer in Mutan is 510; hyperparameter K is 5. The attention hidden unit number is 620. In the decision making phase, the joint feature embedding is set to 510. All the nonlinear layers of the model all use the relu activation function and dropout [34] to prevent overﬁtting. We use Adam[35] to train the model with a learning rate of 10 4 and a batch_size of 64.