Differential Networks for Visual Question Answering

Authors: Chenfei Wu, Jinlai Liu, Xiaojie Wang, Ruifan Li8997-9004

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We achieve state-of-the-art results on four publicly available datasets. Ablation studies also show the effectiveness of difference operations in DF model.
Researcher Affiliation Academia Center for Intelligence Science and Technology Beijing University of Posts and Telecommunications {wuchenfei, liujinlai, xjwang, rfli}@bupt.edu.cn
Pseudocode No The paper describes methodologies through equations and textual descriptions but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code No More details, including source codes, will be published in the near future.
Open Datasets Yes We evaluate our model on four public datasets: the VQA 1.0 dataset (Antol et al. 2015), the VQA 2.0 dataset (Goyal et al. 2017), the COCO-QA dataset (Ren, Kiros, and Zemel 2015), and the TDIUC dataset (Kafle and Kanan 2017a).
Dataset Splits Yes The VQA 1.0 dataset contains a total of 614,163 samples and is divided into three splits: train(40.4%), val(19.8%), test(39.8%).
Hardware Specification No The paper mentions implementation in Pytorch and training parameters, but does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for the experiments.
Software Dependencies No The paper states 'We implement the model using Pytorch' but does not provide specific version numbers for Pytorch or any other software dependencies.
Experiment Setup Yes During the data embedding phase, the image features are mapped to the size of 36 2048 and the text features are mapped to the size of 2400. In the differential fusion phase, the number of hidden layer in DF is 510; hyperparameter S is 1, R is 5. The attention hidden unit number is 620. In the decision making phase, the number of hidden layer in DF is 510. All the nonlinear layers of the model all use the relu activation function and dropout (Srivastava et al. 2014) to prevent overfitting. All settings are commonly used in previous work. We implement the model using Pytorch. We use Adam (Kingma and Ba 2014) to train the model with a learning rate of 10 4 and a batch size of 128.