Automatic Generation of Grounded Visual Questions

Authors: Shijie Zhang, Lizhen Qu, Shaodi You, Zhenglu Yang, Jiawan Zhang

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments to evaluate our model as well as the most competitive baseline with three kinds of measures adapted from the ones commonly used in the tasks of image caption generation and machine translation. The experimental results on two real world datasets show that our model outperforms the strongest baseline in terms of both correctness and diversity with a wide margin.
Researcher Affiliation Academia Shijie Zhang #1, Lizhen Qu 2, Shaodi You 3, Zhenglu Yang 4, Jiawan Zhang 5 #School of Computer Science and Technology, Tianjin University, Tianjin, China Data61-CSIRO, Canberra, Australia Australian National University, Canberra, Australia College of Computer and Control Engineering, Nankai University, Tianjin, China The School of Computer Software, Tianjin University, Tianjin, China
Pseudocode No The paper describes its methods and processes using textual descriptions and mathematical equations, but it does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We conduct our experiments on two datasets: VQA-Dataset [Antol et al., 2015] and Visual7W [Zhu et al., 2015]. The images in those datasets are sampled from the MS-COCO dataset [Lin et al., 2014].
Dataset Splits No The paper mentions tuning hyperparameters 'on the validation sets' but does not provide specific details on the dataset splits (e.g., percentages or counts for training, validation, and test sets).
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments, such as GPU models, CPU specifications, or memory.
Software Dependencies No The paper mentions various models and algorithms used (e.g., Glove, VGG-16, Dense Cap, LSTM, Adam) and cites their original papers, but it does not provide specific version numbers for any software, libraries, or frameworks used in its implementation.
Experiment Setup Yes We fix the batch size to 64. We set the maximal epochs to 64 for Visual7W and the maximal epochs to 128 for VQA. The corresponding model hyperparameters were tuned on the validation sets. Herein, we set α = 0.75.