reproducibilityindex.ai

The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes

Authors: Douwe Kiela, Hamed Firooz, Aravind Mohan, Vedanuj Goswami, Amanpreet Singh, Pratik Ringshia, Davide Testuggine

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide baseline performance numbers for unimodal models, as well as for multimodal models with various degrees of sophistication. We find that state-of-the-art methods perform poorly compared to humans, illustrating the difficulty of the task and highlighting the challenge that this important problem poses to the community.
Researcher Affiliation	Industry	Facebook AI {dkiela,mhfirooz,aramohan,vedanuj,asg,tikir,davidet}@fb.com
Pseudocode	No	The paper includes a flowchart (Figure 2) for the annotation process, but it does not contain pseudocode or a clearly labeled algorithm block.
Open Source Code	Yes	Starter kit code is available at https://github.com/facebookresearch/mmf/tree/master/projects/hateful_memes.
Open Datasets	Yes	A Neur IPS competition will be held based on the dataset described in this paper. The winners will be determined according to performance on a different unseen test set. For more information about the competition, please visit https://ai.facebook.com/hatefulmemes.
Dataset Splits	Yes	We construct a dev and test set from 5% and 10% of the data respectively, and set aside the rest to serve as fine-tuning training data.
Hardware Specification	No	The paper does not specify any particular hardware (e.g., GPU models, CPU types) used for running the experiments. It only mentions model architectures like Res Net-152 and Faster-RCNN.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies used in the experiments. It cites PyTorch but does not specify the version used.
Experiment Setup	Yes	We performed grid search hyperparameter tuning over the learning rate, batch size, warm up and number of iterations. We report results averaged over three random seeds, together with their standard deviation. See Appendix A for more details.