Leveraging Lexical Substitutes for Unsupervised Word Sense Induction

Authors: Domagoj Alagić, Jan Šnajder, Sebastian Padó

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we investigate the use of an alternative instance representation based on lexical substitutes, i.e., contextually suitable, meaning-preserving replacements. Using lexical substitutes predicted by a state-of-the-art automatic system and a simple clustering algorithm, we outperform bag-of-words instance representations and compete with much more complex structured probabilistic models.
Researcher Affiliation Academia Domagoj Alagi c, Jan ˇSnajder Take Lab, Faculty of Electrical Engineering and Computing, University of Zagreb, Sebastian Pad o Institut f ur Maschinelle Sprachverarbeitung University of Stuttgart
Pseudocode No The paper describes algorithms but does not provide structured pseudocode or algorithm blocks.
Open Source Code No The paper states: "We make this dataset publicly available.5 http://takelab.fer.hr/lexsubclu/" but does not provide concrete access to the source code for the methodology described in the paper.
Open Datasets Yes We carry out all of our experiments on the standard SEMEVAL-2010 WSI dataset (Manandhar et al. 2010), used for the shared task as well as for follow-up research on WSI.
Dataset Splits No The paper states: "The SEMEVAL-2010 WSI dataset is split into a training and test portion", and "we use only the test portion in our experiments", but does not provide specific training, validation, and test dataset splits for their model's training and evaluation in a way that would allow direct reproduction of the data partitioning.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper mentions using "the nltk library" and "the affinity propagation implementation of scikit-learn" but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes We use the algorithm s default hyperparameters: factor λ of 0.5 and at most 200 iterations with convergence reached after 15 iterations with no change in the number of estimated clusters. ... To investigate the effect of the number of clusters on WSI, we experiment with two settings for input preference: (1) the default value, set to the median of the similarity matrix, and (2) a value fine-tuned for each POS.