Humor in Word Embeddings: Cockamamie Gobbledegook for Nincompoops
Authors: Limor Gultchin, Genevieve Patterson, Nancy Baym, Nathaniel Swinger, Adam Kalai
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | we show that several aspects of single-word humor correlate with simple linear directions in Word Embeddings. ... To assess the ability of WEs to explain individual word humor, we draw on a long history of humor theories and put them to the test. ... Our analysis suggests that individual-word humor indeed possesses many aspects of humor that have been discussed in general theories of humor, and that many of these aspects of humor are captured by WEs. ... While it would be easy to use Netflix-style collaborative filtering to predict humor ratings, WEs are shown to generalize: given humor ratings on some words, vector representations are able to predict mean humor ratings, humor features and differences in sense of humor on test words that have no humor ratings. |
| Researcher Affiliation | Collaboration | Limor Gultchin 1 University of Oxford 2TRASH 3Microsoft Research 4Lexington High School. Correspondence to: Limor Gultchin <limor.gultchin@jesus.ox.ac.uk>. |
| Pseudocode | No | The paper describes the methods used (e.g., K-means++, linear regression) but does not provide any pseudocode blocks or formally labeled algorithm sections. |
| Open Source Code | Yes | The list of 120,000 strings is included in the public dataset we are making available alongside this paper.2 https://github.com/limorigu/ Cockamamie-Gobbledegook |
| Open Datasets | Yes | Our first source of data is the EH dataset, which is publicly available (Engelthaler & Hills, 2017). and To complete this picture, we performed crowdsourcing studies to create additional datasets which we make publicly available: and The list of 120,000 strings is included in the public dataset we are making available alongside this paper.2 https://github.com/limorigu/ Cockamamie-Gobbledegook |
| Dataset Splits | Yes | For the second column, 10-fold cross validation is used to predict the labels (so each fold is predicted separately as hold-out) and then the correlation between the predictions and EH ratings is computed. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as GPU or CPU models, memory, or cloud computing specifications. |
| Software Dependencies | No | The paper mentions using 'K-means++ in sci-kit learn' but does not specify a version number for scikit-learn or any other software dependencies crucial for reproduction, such as Python or specific deep learning libraries. |
| Experiment Setup | Yes | We consider several publicly-available pre-trained embeddings all using d = 300 dimensions, trained on news and web data using different algorithms. and Hence, we begin by fitting a simple least-squares linear regression model, predicting the EH mean humor rating for each word in the dataset from its 300-dimensional embedding. and We used K-means++ in sci-kit learn with default parameters (Pedregosa et al., 2011). |