The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes

Authors: Douwe Kiela, Hamed Firooz, Aravind Mohan, Vedanuj Goswami, Amanpreet Singh, Pratik Ringshia, Davide Testuggine

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide baseline performance numbers for unimodal models, as well as for multimodal models with various degrees of sophistication. We find that state-of-the-art methods perform poorly compared to humans, illustrating the difficulty of the task and highlighting the challenge that this important problem poses to the community.
Researcher Affiliation Industry Facebook AI {dkiela,mhfirooz,aramohan,vedanuj,asg,tikir,davidet}@fb.com
Pseudocode No The paper includes a flowchart (Figure 2) for the annotation process, but it does not contain pseudocode or a clearly labeled algorithm block.
Open Source Code Yes Starter kit code is available at https://github.com/facebookresearch/mmf/tree/master/projects/hateful_memes.
Open Datasets Yes A Neur IPS competition will be held based on the dataset described in this paper. The winners will be determined according to performance on a different unseen test set. For more information about the competition, please visit https://ai.facebook.com/hatefulmemes.
Dataset Splits Yes We construct a dev and test set from 5% and 10% of the data respectively, and set aside the rest to serve as fine-tuning training data.
Hardware Specification No The paper does not specify any particular hardware (e.g., GPU models, CPU types) used for running the experiments. It only mentions model architectures like Res Net-152 and Faster-RCNN.
Software Dependencies No The paper does not provide specific version numbers for software dependencies used in the experiments. It cites PyTorch but does not specify the version used.
Experiment Setup Yes We performed grid search hyperparameter tuning over the learning rate, batch size, warm up and number of iterations. We report results averaged over three random seeds, together with their standard deviation. See Appendix A for more details.