The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes
Authors: Douwe Kiela, Hamed Firooz, Aravind Mohan, Vedanuj Goswami, Amanpreet Singh, Pratik Ringshia, Davide Testuggine
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide baseline performance numbers for unimodal models, as well as for multimodal models with various degrees of sophistication. We find that state-of-the-art methods perform poorly compared to humans, illustrating the difficulty of the task and highlighting the challenge that this important problem poses to the community. |
| Researcher Affiliation | Industry | Facebook AI {dkiela,mhfirooz,aramohan,vedanuj,asg,tikir,davidet}@fb.com |
| Pseudocode | No | The paper includes a flowchart (Figure 2) for the annotation process, but it does not contain pseudocode or a clearly labeled algorithm block. |
| Open Source Code | Yes | Starter kit code is available at https://github.com/facebookresearch/mmf/tree/master/projects/hateful_memes. |
| Open Datasets | Yes | A Neur IPS competition will be held based on the dataset described in this paper. The winners will be determined according to performance on a different unseen test set. For more information about the competition, please visit https://ai.facebook.com/hatefulmemes. |
| Dataset Splits | Yes | We construct a dev and test set from 5% and 10% of the data respectively, and set aside the rest to serve as fine-tuning training data. |
| Hardware Specification | No | The paper does not specify any particular hardware (e.g., GPU models, CPU types) used for running the experiments. It only mentions model architectures like Res Net-152 and Faster-RCNN. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies used in the experiments. It cites PyTorch but does not specify the version used. |
| Experiment Setup | Yes | We performed grid search hyperparameter tuning over the learning rate, batch size, warm up and number of iterations. We report results averaged over three random seeds, together with their standard deviation. See Appendix A for more details. |