reproducibilityindex.ai

Laughing Heads: Can Transformers Detect What Makes a Sentence Funny?

Authors: Maxime Peyrard, Beatriz Borges, Kristina Gligorić, Robert West

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We make progress in both respects by training and analyzing transformer-based humor recognition models on a recently introduced dataset consisting of minimal pairs of aligned sentences, one serious, the other humorous. We ﬁnd that, although our aligned dataset is much harder than previous datasets, transformer-based models recognize the humorous sentence in an aligned pair with high accuracy (78%). In a careful error analysis, we characterize easy vs. hard instances.
Researcher Affiliation	Academia	Maxime Peyrard, Beatriz Borges, Kristina Gligori c and Robert West EPFL {maxime.peyrard, beatriz.borges, kristina.gligoric, robert.west}@epﬂ.ch
Pseudocode	No	The paper describes models and calculations but does not present pseudocode or algorithm blocks.
Open Source Code	Yes	Code and data,1 as well as an extended version of the paper (with the appendices referenced here),2 are available online. 1https://github.com/epfl-dlab/ laughing-head
Open Datasets	Yes	We use an extended dataset of 23,113 pairs [West and Horvitz, 2019b],which we randomly split into 18,832 training pairs, 2,414 validation pairs, and 1,867 testing pairs. Additionally, as part of the game, a subset of pairs has been annotated with quality ratings measuring how well the unfunning process worked, i.e., whether the unfunned sentence was perceived as serious by other humans. These annotations come from other players who evaluated the quality of the unfunned sentences (for details, see West and Horvitz [2019a]). From these annotations, we form a restrictive high-quality test set of instances that received the maximum score according to all annotators, consisting of 754 pairs. We later refer to this test set as HQ . Finally, a subset of the test set (254 pairs) comes with manual annotations from two trained annotators capturing the opposition that leads to humor in the pair (cf. Sec. 1). ... West and Horvitz, 2019b] Robert West and Eric Horvitz. Unfun.me dataset. https://github.com/epfl-dlab/ unfun, 2019. Accessed: 2021-01-15.
Dataset Splits	Yes	We use an extended dataset of 23,113 pairs [West and Horvitz, 2019b],which we randomly split into 18,832 training pairs, 2,414 validation pairs, and 1,867 testing pairs.
Hardware Specification	No	The paper does not specify the hardware used for experiments.
Software Dependencies	No	The paper mentions software like BERT, GPT2, distil BERT, RoBERTa, FastText, LSTM, and Transformer but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	No	The paper describes model architectures and training setups (e.g., 'fine-tuning the full pretrained model with backpropagation', 'Siamese networks'), but it does not specify concrete hyperparameters like learning rate, batch size, or number of epochs in the main text.