Feature Enhancement in Attention for Visual Question Answering

Authors: Yuetan Lin, Zhangyang Pang, Donghui Wang, Yueting Zhuang

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on the largest VQA v2.0 benchmark dataset and achieve competitive results without additional training data, and prove the effectiveness of our proposed feature-enhanced attention by visual demonstrations.
Researcher Affiliation Academia Yuetan Lin, Zhangyang Pang, Donghui Wang , Yueting Zhuang College of Computer Science, Zhejiang University Hangzhou, P. R. China {linyuetan,pzy,dhwang,yzhuang}@zju.edu.cn
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not provide any concrete access to source code for the described methodology.
Open Datasets Yes We trained our model and conducted comparative experiments on VQA v2.0 [Goyal et al., 2017] dataset.
Dataset Splits Yes 10 human-labeled answer annotations per question are provided for training and validation splits.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions various models and optimizers (e.g., GloVe, GRU, RMSprop, Faster R-CNN) but does not provide specific software dependency names with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x).
Experiment Setup Yes The learning rate is initialized to be 3e4 and kept fixed for the first 40 epochs, and is decayed every 10 epochs with a decay factor. Difference. The main differences between two versions are the attention, dropout usage and learning rate changing manner. The base-att model uses dropout of 0.5 only after word embedding layer, before generating the attention weights and before generating answer, while the double-att uses dropout of 0.5 before every linear layer. The learning rate decay factor of two versions are 0.8 and 0.9, respectively.