reproducibilityindex.ai

Bayesian Verb Sense Clustering

Authors: Daniel Peterson, Martha Palmer

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Relative to the prior state of the art, we improve accuracy on verb sense induction by over 20% absolute F1. ... Our best model shows a 4.5% absolute F1 improvement over the best non-PPMI model, with over an order of magnitude less computation time. Table 1 shows the clustering m PU, i PU, and F1 score (simple harmonic mean of m PU and i PU) for senses induced from various models (trained on Gigaword or Google Books syntactic n-grams corpora, with 100 and 200 topics.
Researcher Affiliation	Academia	Daniel W. Peterson, Martha Palmer University of Colorado {daniel.w.peterson,martha.palmer}@colorado.edu
Pseudocode	Yes	Algorithm 1 Sampling verb senses in the Dirichlet Multinomial mixture, Algorithm 2 Sampling verb senses with common topics, Algorithm 3 Clustering with Exponential Mixture of PPMI Vectors
Open Source Code	No	The paper does not provide any explicit statement or link for open-source code for the described methodology.
Open Datasets	Yes	We ran our sense induction on two datasets. The ﬁrst, in order to permit direct comparison with prior work, was the Gigaword corpus (Parker et al. 2011). The second is the freely-available Google Books syntactic n-grams corpus (Goldberg and Orwant 2013). ... We use instances from the Sem Link corpus (Palmer 2009), which has Verb Net class annotation.
Dataset Splits	No	The paper mentions 'test set' in the context of evaluation, but does not provide specific details on how the dataset was split into training, validation, and test portions (e.g., percentages or sample counts) to ensure reproducibility of data partitioning.
Hardware Specification	No	The paper states 'Runtimes are measured in seconds, processed on the same single machine with roughly equivalent optimization,' but does not provide specific details about the hardware specifications of this machine (e.g., CPU, GPU models, memory).
Software Dependencies	No	The paper discusses various models and algorithms like LDA and Dirichlet Multinomial mixtures, but does not provide specific software dependencies with version numbers (e.g., programming language versions, library versions like PyTorch, TensorFlow, or scikit-learn).
Experiment Setup	No	The paper mentions general parameter ranges for `τ` ('[0.01, 1] produced reasonable results') and notes the use of 100 and 200 topics, but does not provide specific, concrete hyperparameter values (e.g., exact `τ` value used for best results, `α` value, specific number of iterations, or batch sizes) to fully reproduce the experimental setup.