reproducibilityindex.ai

Overcoming Language Priors with Self-supervised Learning for Visual Question Answering

Authors: Xi Zhu, Zhendong Mao, Chunxiao Liu, Peng Zhang, Bin Wang, Yongdong Zhang

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that our method achieves state-of-the-art performance, improving the overall accuracy from 49.50% to 57.59% on the most commonly used benchmark VQA-CP v2.
Researcher Affiliation	Collaboration	1Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China 2School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China 3University of Science and Technology of China, Hefei, China 4Xiaomi AI Lab, Xiaomi Inc., Beijing, China
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available on Git Hub1. 1https://github.com/Crossmodal Group/SSL-VQA
Open Datasets	Yes	Our approach is evaluated on the most commonly used benchmark VQA-CP v2 [Agrawal et al., 2018] with the standard evaluation metric [Antol et al., 2015].
Dataset Splits	Yes	The VQA-CP v2 dataset is derived from VQA v2 [Goyal et al., 2017] by reorganizing the train and validation splits, and the Q-A pairs in the training set and test set have different distributions. ... We also evaluate our model on the VQA v2 dataset containing strong biases and report the results on its validation split.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions software components like 'Faster R-CNN', 'Glove embeddings', 'GRU', and 'Adam optimizer' but does not specify their version numbers or the versions of underlying frameworks (e.g., PyTorch, TensorFlow).
Experiment Setup	Yes	We pre-train the model with the VQA loss for 12 epochs and ﬁne-tune it with the self-supervised loss for 20 epochs. The batch size is 256, and the irrelevant images are randomly selected from mini-batches. The Adam optimizer is adopted with the initial learning rate of 0.001 which is halved every 5 epochs after 10 epochs. We evaluate our approach with different VQA losses in our main experiment, setting α = 3 for multi-label VQA loss and α = 1.2 for cross-entropy VQA loss.