reproducibilityindex.ai

Chop Chop BERT: Visual Question Answering by Chopping VisualBERT’s Heads

Authors: Chenyu Gao, Qi Zhu, Peng Wang, Qi Wu

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	As shown in the interesting echelon shape of the result matrices, experiments reveal different heads and layers are responsible for different question types, with higher-level layers activated by higher-level visual reasoning questions. Our experiments based on the Visual BERT, as for it s general Transformer style architecture without more extra designs.
Researcher Affiliation	Academia	1School of Computer Science, Northwestern Polytechnical University, Xi an, China 2School of Software, Northwestern Polytechnical University, Xi an, China 3National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology, China 4University of Adelaide, Australia
Pseudocode	No	The paper describes its methods but does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statements about releasing source code or links to a code repository for the described methodology.
Open Datasets	Yes	All of our experiments are conducted on the Task Driven Image Understanding Challenge (TDIUC) [Kaﬂe and Kanan, 2017a] dataset, a large VQA dataset. This dataset was proposed to compensate for the bias in distribution of different question types of VQA 2.0 [Goyal et al., 2017].
Dataset Splits	No	The paper mentions using the TDIUC dataset and fine-tuning but does not explicitly provide details about training, validation, and test dataset splits with percentages or counts.
Hardware Specification	Yes	Experiments are conducted on 4 NVIDIA Ge Force 2080Ti GPUs with a batch size of 480.
Software Dependencies	No	The paper mentions 'Py Torch' but does not provide specific version numbers for software dependencies or libraries.
Experiment Setup	Yes	We load the model pre-trained on COCO Caption [Chen et al., 2015] dataset, then ﬁnetune it with a leaning rate of 5e 5 on the TDIUC dataset. The maximal learning rate is 1e 3 and the batch size is 480.