reproducibilityindex.ai

Learning to Generate Visual Questions with Noisy Supervision

Authors: Shen Kai, Lingfei Wu, Siliang Tang, Yueting Zhuang, zhen he, Zhuoye Ding, Yun Xiao, Bo Long

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on two benchmark datasets show that our proposed method outperforms the state-of-the-art approaches by a large margin on a variety of metrics, including both automatic machine metrics and human evaluation.
Researcher Affiliation	Collaboration	Kai Shen , Lingfei Wu , Siliang Tang , Yueting Zhuang , Zhen He , Zhuoye Ding , Yun Xiao , and Bo LongZhejiang University JD.COM shenkai@zju.edu.cn, lwu@email.wm.edu, {siliang,yzhuang}@zju.edu.cn, {bjhezhen,dingzhuoye,xiaoyun1,bo.long}@jd.com
Pseudocode	No	The paper provides mathematical formulations and descriptions of modules but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	1The code and data for our model are provided for research purposes: DH-GAN for VQG Github Repo.
Open Datasets	Yes	We conduct the experiments on the VQA2.0 [3] and COCO-QA [39] datasets.
Dataset Splits	Yes	After pre-processing, the VQA2.0 has 278707/135584, and the COCO-QA dataset has 58979/29017 examples for training/validation split, respectively. Since the test splits (for these two datasets) are not open for the public, we divide the validation set to 10% validation split and 90% test split.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions software tools like 'pre-trained Faster-RCNN' but does not specify version numbers for any key software components or libraries.
Experiment Setup	Yes	For text data, we truncate the questions longer than 20 words and build the vocabulary on the words with at least 3 occurrences. We train it with the cross-entropy loss function denoted as Llm. The loss function of the generator derived from Eq. 11 can be written as: Lrl = [R({I, Q, A}) R({I, ˆQ, A})][log P(Q\|I, A, V) + β log P(V\|I, A)], where β is the hyper-parameter, log P(Q\|I, A, V) is the question generation loss with target question Q in Sec. 2.2.2 and log P(V\|I, A) is the visual hints prediction loss given target visual hints V in Sec. 2.2.1. Practically, we ﬁnd that it is unstable to update the generator by minimizing the loss Lrl. Thus we combine both the teacher-forcing loss Lsup in Eq. 6 and the reinforcement loss as: LG = γLrl + (1 γ)Lsup, where γ is a scaling factor controlling the trade-off between teacher-forcing loss and RL loss.