reproducibilityindex.ai

EViLBERT: Learning Task-Agnostic Multimodal Sense Embeddings

Authors: Agostina Calabrese, Michele Bevilacqua, Roberto Navigli

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments are organised in two main blocks. The ﬁrst focuses on the evaluation of our proposed approach for the automatic veriﬁcation of concept-image associations in both the concrete and non-concrete domains (Section 4.1). The second set of experiments, instead, assesses the effectiveness of our multimodal concept embeddings by evaluating them in the Word Sense Disambiguation task (Section 4.2).
Researcher Affiliation	Academia	Agostina Calabrese , Michele Bevilacqua and Roberto Navigli Sapienza NLP Group, Department of Computer Science, Sapienza University of Rome {calabrese.a, bevilacqua, navigli}@di.uniroma1.it
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	We release code, dataset and embeddings at http://babelpic.org.
Open Datasets	Yes	To address this issue we start from Babel Pic [Calabrese et al., 2020], which includes manually annotated concept-image pairs. ... Our gold dataset includes 2,733 synsets and 14,931 images. ... Our silver dataset includes 42,579 synsets and 257,499 images. ... Additionally, the paper references standard datasets like ImageNet [Deng et al., 2009], COCO [Lin et al., 2014], Flickr30k Entities [Plummer et al., 2015], Open Images [Kuznetsova et al., 2020], VQA 2.0 dataset [Goyal et al., 2017], Conceptual Captions (CC), Visual Genome [Krishna et al., 2017; Anderson et al., 2018], Sem Cor corpus, and Sem Eval-2015 [Moro and Navigli, 2015].
Dataset Splits	Yes	We perform the splitting of the dataset according to the 80%/10%/10% rule, hence deﬁning training, validation and test sets. ... Validation 10.18 1.98 37.84 (Table 1)
Hardware Specification	No	The paper does not provide specific details on the hardware used for the experiments, such as GPU or CPU models.
Software Dependencies	No	The paper mentions software components like BERT, VLP, and Faster R-CNN, but does not specify their version numbers or other software dependencies with versions.
Experiment Setup	Yes	When training the VLP architecture on our gold dataset, we keep the same setting as in the original paper. That is, we set the number of both hidden layers and attention heads of the BERT encoder to 12. We train the model for 20 epochs with learning rate of 2 10 5 and a dropout rate of 0.1, selecting the weights of the best epoch, i.e. the one achieving the highest F1 score on the validation set. ... We train the system on the Sem Cor corpus for a maximum of 10 epochs, with the Adam optimizer and a learning rate of 10 4, feeding the input in batches of 250 instances.