Chop Chop BERT: Visual Question Answering by Chopping VisualBERT’s Heads
Authors: Chenyu Gao, Qi Zhu, Peng Wang, Qi Wu
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | As shown in the interesting echelon shape of the result matrices, experiments reveal different heads and layers are responsible for different question types, with higher-level layers activated by higher-level visual reasoning questions. Our experiments based on the Visual BERT, as for it s general Transformer style architecture without more extra designs. |
| Researcher Affiliation | Academia | 1School of Computer Science, Northwestern Polytechnical University, Xi an, China 2School of Software, Northwestern Polytechnical University, Xi an, China 3National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology, China 4University of Adelaide, Australia |
| Pseudocode | No | The paper describes its methods but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository for the described methodology. |
| Open Datasets | Yes | All of our experiments are conducted on the Task Driven Image Understanding Challenge (TDIUC) [Kafle and Kanan, 2017a] dataset, a large VQA dataset. This dataset was proposed to compensate for the bias in distribution of different question types of VQA 2.0 [Goyal et al., 2017]. |
| Dataset Splits | No | The paper mentions using the TDIUC dataset and fine-tuning but does not explicitly provide details about training, validation, and test dataset splits with percentages or counts. |
| Hardware Specification | Yes | Experiments are conducted on 4 NVIDIA Ge Force 2080Ti GPUs with a batch size of 480. |
| Software Dependencies | No | The paper mentions 'Py Torch' but does not provide specific version numbers for software dependencies or libraries. |
| Experiment Setup | Yes | We load the model pre-trained on COCO Caption [Chen et al., 2015] dataset, then finetune it with a leaning rate of 5e 5 on the TDIUC dataset. The maximal learning rate is 1e 3 and the batch size is 480. |