reproducibilityindex.ai

Humor in Word Embeddings: Cockamamie Gobbledegook for Nincompoops

Authors: Limor Gultchin, Genevieve Patterson, Nancy Baym, Nathaniel Swinger, Adam Kalai

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	we show that several aspects of single-word humor correlate with simple linear directions in Word Embeddings. ... To assess the ability of WEs to explain individual word humor, we draw on a long history of humor theories and put them to the test. ... Our analysis suggests that individual-word humor indeed possesses many aspects of humor that have been discussed in general theories of humor, and that many of these aspects of humor are captured by WEs. ... While it would be easy to use Netﬂix-style collaborative ﬁltering to predict humor ratings, WEs are shown to generalize: given humor ratings on some words, vector representations are able to predict mean humor ratings, humor features and differences in sense of humor on test words that have no humor ratings.
Researcher Affiliation	Collaboration	Limor Gultchin 1 University of Oxford 2TRASH 3Microsoft Research 4Lexington High School. Correspondence to: Limor Gultchin <limor.gultchin@jesus.ox.ac.uk>.
Pseudocode	No	The paper describes the methods used (e.g., K-means++, linear regression) but does not provide any pseudocode blocks or formally labeled algorithm sections.
Open Source Code	Yes	The list of 120,000 strings is included in the public dataset we are making available alongside this paper.2 https://github.com/limorigu/ Cockamamie-Gobbledegook
Open Datasets	Yes	Our first source of data is the EH dataset, which is publicly available (Engelthaler & Hills, 2017). and To complete this picture, we performed crowdsourcing studies to create additional datasets which we make publicly available: and The list of 120,000 strings is included in the public dataset we are making available alongside this paper.2 https://github.com/limorigu/ Cockamamie-Gobbledegook
Dataset Splits	Yes	For the second column, 10-fold cross validation is used to predict the labels (so each fold is predicted separately as hold-out) and then the correlation between the predictions and EH ratings is computed.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments, such as GPU or CPU models, memory, or cloud computing specifications.
Software Dependencies	No	The paper mentions using 'K-means++ in sci-kit learn' but does not specify a version number for scikit-learn or any other software dependencies crucial for reproduction, such as Python or specific deep learning libraries.
Experiment Setup	Yes	We consider several publicly-available pre-trained embeddings all using d = 300 dimensions, trained on news and web data using different algorithms. and Hence, we begin by ﬁtting a simple least-squares linear regression model, predicting the EH mean humor rating for each word in the dataset from its 300-dimensional embedding. and We used K-means++ in sci-kit learn with default parameters (Pedregosa et al., 2011).