Densely Connected Attention Flow for Visual Question Answering
Authors: Fei Liu, Jing Liu, Zhiwei Fang, Richang Hong, Hanqing Lu
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on three publicly available datasets show that the proposed method achieves state-of-the-art performance. |
| Researcher Affiliation | Academia | Fei Liu1,2 , Jing Liu1 , Zhiwei Fang1,2 , Richang Hong3 , Hanqing Lu1 1National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences 2University of Chinese Academy of Sciences 3School of Computer and Information, Hefei University of Technology |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about open-sourcing the code for the described methodology or a link to a code repository. |
| Open Datasets | Yes | We use the VQA 1.0 [Antol et al., 2015], VQA 2.0 [Goyal et al., 2017] and TDIUC [Kafle and Kanan, 2017] datasets for our experiments. |
| Dataset Splits | Yes | VQA 1.0 is built from 204,721 MSCOCO images with human annotated questions and answers. The dataset is divided into three splits: train (248,349 questions), val (121,512 questions) and test (244,302 questions). VQA 2.0 is an updated version of VQA 1.0. It contains more samples (443,757 train, 214,354 val, and 447,793 test questions) |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions using GloVe for word embeddings and Faster RCNN for feature extraction, and AMSGrad as an optimizer, but does not specify version numbers for any software libraries or dependencies. |
| Experiment Setup | Yes | The model is trained using the AMSGrad [Reddi et al., 2018] optimizer with an initial learning rate of 6 × 10−4. The batch size is set to 128. |