Bayesian Verb Sense Clustering

Authors: Daniel Peterson, Martha Palmer

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Relative to the prior state of the art, we improve accuracy on verb sense induction by over 20% absolute F1. ... Our best model shows a 4.5% absolute F1 improvement over the best non-PPMI model, with over an order of magnitude less computation time. Table 1 shows the clustering m PU, i PU, and F1 score (simple harmonic mean of m PU and i PU) for senses induced from various models (trained on Gigaword or Google Books syntactic n-grams corpora, with 100 and 200 topics.
Researcher Affiliation Academia Daniel W. Peterson, Martha Palmer University of Colorado {daniel.w.peterson,martha.palmer}@colorado.edu
Pseudocode Yes Algorithm 1 Sampling verb senses in the Dirichlet Multinomial mixture, Algorithm 2 Sampling verb senses with common topics, Algorithm 3 Clustering with Exponential Mixture of PPMI Vectors
Open Source Code No The paper does not provide any explicit statement or link for open-source code for the described methodology.
Open Datasets Yes We ran our sense induction on two datasets. The first, in order to permit direct comparison with prior work, was the Gigaword corpus (Parker et al. 2011). The second is the freely-available Google Books syntactic n-grams corpus (Goldberg and Orwant 2013). ... We use instances from the Sem Link corpus (Palmer 2009), which has Verb Net class annotation.
Dataset Splits No The paper mentions 'test set' in the context of evaluation, but does not provide specific details on how the dataset was split into training, validation, and test portions (e.g., percentages or sample counts) to ensure reproducibility of data partitioning.
Hardware Specification No The paper states 'Runtimes are measured in seconds, processed on the same single machine with roughly equivalent optimization,' but does not provide specific details about the hardware specifications of this machine (e.g., CPU, GPU models, memory).
Software Dependencies No The paper discusses various models and algorithms like LDA and Dirichlet Multinomial mixtures, but does not provide specific software dependencies with version numbers (e.g., programming language versions, library versions like PyTorch, TensorFlow, or scikit-learn).
Experiment Setup No The paper mentions general parameter ranges for `τ` ('[0.01, 1] produced reasonable results') and notes the use of 100 and 200 topics, but does not provide specific, concrete hyperparameter values (e.g., exact `τ` value used for best results, `α` value, specific number of iterations, or batch sizes) to fully reproduce the experimental setup.