Densely Connected Attention Flow for Visual Question Answering

Authors: Fei Liu, Jing Liu, Zhiwei Fang, Richang Hong, Hanqing Lu

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on three publicly available datasets show that the proposed method achieves state-of-the-art performance.
Researcher Affiliation Academia Fei Liu1,2 , Jing Liu1 , Zhiwei Fang1,2 , Richang Hong3 , Hanqing Lu1 1National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences 2University of Chinese Academy of Sciences 3School of Computer and Information, Hefei University of Technology
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement about open-sourcing the code for the described methodology or a link to a code repository.
Open Datasets Yes We use the VQA 1.0 [Antol et al., 2015], VQA 2.0 [Goyal et al., 2017] and TDIUC [Kafle and Kanan, 2017] datasets for our experiments.
Dataset Splits Yes VQA 1.0 is built from 204,721 MSCOCO images with human annotated questions and answers. The dataset is divided into three splits: train (248,349 questions), val (121,512 questions) and test (244,302 questions). VQA 2.0 is an updated version of VQA 1.0. It contains more samples (443,757 train, 214,354 val, and 447,793 test questions)
Hardware Specification No The paper does not provide specific details about the hardware used for experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions using GloVe for word embeddings and Faster RCNN for feature extraction, and AMSGrad as an optimizer, but does not specify version numbers for any software libraries or dependencies.
Experiment Setup Yes The model is trained using the AMSGrad [Reddi et al., 2018] optimizer with an initial learning rate of 6 × 10−4. The batch size is set to 128.