Chain of Reasoning for Visual Question Answering
Authors: Chenfei Wu, Jinlai Liu, Xiaojie Wang, Xuan Dong
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We achieve new state-of-the-art results on four publicly available datasets. We conduct a detailed ablation study to show that our proposed chain structure is superior to stack structure and parallel structure. |
| Researcher Affiliation | Academia | Chenfei Wu , Jinlai Liu , Xiaojie Wang, Xuan Dong Center for Intelligence Science and Technology Beijing University of Posts and Telecommunications {wuchenfei,liujinlai, xjwang, dongxuan8811}@bupt.edu.cn |
| Pseudocode | No | The paper describes the model architecture and operations using mathematical equations (Eq. 1-13) and textual explanations, but it does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | More details, including source codes, will be published in the near future. |
| Open Datasets | Yes | We evaluate our model on four public datasets: the VQA 1.0 dataset [29], the VQA 2.0 dataset [30], the COCO-QA dataset[31] and the TDIUC dataset [27]. |
| Dataset Splits | Yes | For a fair comparion, all the data provided in this section are trained under the VQA 2.0 training set and tested on the VQA 2.0 validation set. |
| Hardware Specification | No | The paper mentions implementing the model using Pytorch but does not specify any hardware details like GPU models, CPU types, or cloud computing environments used for running the experiments. |
| Software Dependencies | No | We implement the model using Pytorch. We use Adam[35] to train the model. While software names like Pytorch and Adam are mentioned, no specific version numbers are provided for these or other dependencies. |
| Experiment Setup | Yes | During the data-embedding phase, the image features are mapped to the size of 36 2048 and the text features are mapped to the size of 2400. In the chain of reasoning phase, the number of hidden layer in Mutan is 510; hyperparameter K is 5. The attention hidden unit number is 620. In the decision making phase, the joint feature embedding is set to 510. All the nonlinear layers of the model all use the relu activation function and dropout [34] to prevent overfitting. We use Adam[35] to train the model with a learning rate of 10 4 and a batch_size of 64. |