reproducibilityindex.ai

Leveraging Lexical Substitutes for Unsupervised Word Sense Induction

Authors: Domagoj Alagić, Jan Šnajder, Sebastian Padó

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we investigate the use of an alternative instance representation based on lexical substitutes, i.e., contextually suitable, meaning-preserving replacements. Using lexical substitutes predicted by a state-of-the-art automatic system and a simple clustering algorithm, we outperform bag-of-words instance representations and compete with much more complex structured probabilistic models.
Researcher Affiliation	Academia	Domagoj Alagi c, Jan ˇSnajder Take Lab, Faculty of Electrical Engineering and Computing, University of Zagreb, Sebastian Pad o Institut f ur Maschinelle Sprachverarbeitung University of Stuttgart
Pseudocode	No	The paper describes algorithms but does not provide structured pseudocode or algorithm blocks.
Open Source Code	No	The paper states: "We make this dataset publicly available.5 http://takelab.fer.hr/lexsubclu/" but does not provide concrete access to the source code for the methodology described in the paper.
Open Datasets	Yes	We carry out all of our experiments on the standard SEMEVAL-2010 WSI dataset (Manandhar et al. 2010), used for the shared task as well as for follow-up research on WSI.
Dataset Splits	No	The paper states: "The SEMEVAL-2010 WSI dataset is split into a training and test portion", and "we use only the test portion in our experiments", but does not provide specific training, validation, and test dataset splits for their model's training and evaluation in a way that would allow direct reproduction of the data partitioning.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions using "the nltk library" and "the afﬁnity propagation implementation of scikit-learn" but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	We use the algorithm s default hyperparameters: factor λ of 0.5 and at most 200 iterations with convergence reached after 15 iterations with no change in the number of estimated clusters. ... To investigate the effect of the number of clusters on WSI, we experiment with two settings for input preference: (1) the default value, set to the median of the similarity matrix, and (2) a value ﬁne-tuned for each POS.