Multi-Question Learning for Visual Question Answering

Authors: Chenyi Lei, Lei Wu, Dong Liu, Zhao Li, Guoxin Wang, Haihong Tang, Houqiang Li11328-11335

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on public datasets show the favorable performance of the proposed MQL-VQA framework compared to state-of-the-arts.
Researcher Affiliation Collaboration Chenyi Lei,1,2 Lei Wu,3 Dong Liu,1 Zhao Li,2 Guoxin Wang,2 Haihong Tang,2 Houqiang Li1 1University of Science and Technology of China 2Alibaba Group, 3Zhejiang University
Pseudocode No The paper describes the proposed framework and its components in text and diagrams, but does not include explicit pseudocode or algorithm blocks.
Open Source Code No The paper does not provide a direct link to the source code for the methodology described, nor does it explicitly state that the code will be made publicly available.
Open Datasets Yes TGIF-QA dataset1. It is a large-scale public dataset introduced by (Jang et al. 2017)... CPT-QA dataset2. CPT-QA dataset is a recent Video QA dataset released by Zhejiang Lab.
Dataset Splits No For the CPT-QA dataset, 'we divide its training set into two parts: randomly selecting 90% video sequences for training and using the rest for the test.' For TGIF-QA, 'We follow the training and test dataset of TGIF-QA.' No explicit percentages or counts are provided for a validation set split for either dataset.
Hardware Specification No The paper states 'All approaches are implemented on the Tensor Flow, utilizing 100 parameter servers and 2000 workers, each of with runs with 15 CPU cores.' This describes the computational setup but does not specify exact hardware models like specific GPU or CPU types.
Software Dependencies No The paper mentions 'Tensor Flow' as the implementation framework but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes The batch size of Ours (MQL) is set to be 16. All the batches are trained for 30 epochs. We use the Adam optimizer for all the approaches. The initial learning rate is set to 1e-4.