Feature Enhancement in Attention for Visual Question Answering
Authors: Yuetan Lin, Zhangyang Pang, Donghui Wang, Yueting Zhuang
IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on the largest VQA v2.0 benchmark dataset and achieve competitive results without additional training data, and prove the effectiveness of our proposed feature-enhanced attention by visual demonstrations. |
| Researcher Affiliation | Academia | Yuetan Lin, Zhangyang Pang, Donghui Wang , Yueting Zhuang College of Computer Science, Zhejiang University Hangzhou, P. R. China {linyuetan,pzy,dhwang,yzhuang}@zju.edu.cn |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access to source code for the described methodology. |
| Open Datasets | Yes | We trained our model and conducted comparative experiments on VQA v2.0 [Goyal et al., 2017] dataset. |
| Dataset Splits | Yes | 10 human-labeled answer annotations per question are provided for training and validation splits. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions various models and optimizers (e.g., GloVe, GRU, RMSprop, Faster R-CNN) but does not provide specific software dependency names with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x). |
| Experiment Setup | Yes | The learning rate is initialized to be 3e4 and kept fixed for the first 40 epochs, and is decayed every 10 epochs with a decay factor. Difference. The main differences between two versions are the attention, dropout usage and learning rate changing manner. The base-att model uses dropout of 0.5 only after word embedding layer, before generating the attention weights and before generating answer, while the double-att uses dropout of 0.5 before every linear layer. The learning rate decay factor of two versions are 0.8 and 0.9, respectively. |