reproducibilityindex.ai

VL-BERT: Pre-training of Generic Visual-Linguistic Representations

Authors: Weijie Su, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei, Jifeng Dai

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive empirical analysis demonstrates that the pre-training procedure can better align the visual-linguistic clues and beneﬁt the downstream tasks, such as visual commonsense reasoning, visual question answering and referring expression comprehension. It is worth noting that VL-BERT achieved the ﬁrst place of single model on the leaderboard of the VCR benchmark.
Researcher Affiliation	Collaboration	Weijie Su1,2 , Xizhou Zhu1,2 , Yue Cao2, Bin Li1, Lewei Lu2, Furu Wei2, Jifeng Dai2 1University of Science and Technology of China 2Microsoft Research Asia
Pseudocode	No	The paper does not contain any pseudocode or algorithm blocks.
Open Source Code	Yes	Code is released at https://github.com/jackroos/VL-BERT.
Open Datasets	Yes	We pre-train VL-BERT on both visual-linguistic and text-only datasets1. Here we utilize the Conceptual Captions dataset (Sharma et al., 2018) as the visual-linguistic corpus. It contains around 3.3 million images annotated with captions... We utilize the Books Corpus (Zhu et al., 2015) and the English Wikipedia datasets, which are also utilized in pre-training BERT.
Dataset Splits	Yes	The released VCR dataset consists of 265k pairs of questions, answers, and rationales, over 100k unique movie scenes (100k images). They are split into training, validation, and test sets consisting of 213k questions and 80k images, 27k questions and 10k images, and 25k questions and 10k images, respectively.
Hardware Specification	Yes	Pre-training is conducted on 16 Tesla V100 GPUs for 250k iterations by SGD.
Software Dependencies	No	The paper mentions using 'Adam optimizer (Kingma & Ba, 2014)' but does not specify software versions for libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages.
Experiment Setup	Yes	In SGD, Adam optimizer (Kingma & Ba, 2014) is applied, with base learning rate of 2 10 5, β1 = 0.9, β2 = 0.999, weight decay of 10 4, learning rate warmed up over the ﬁrst 8,000 steps, and linear decay of the learning rate. All the parameters in VL-BERT and Fast R-CNN are jointly trained in both pre-training and ﬁne-tuning phase.