Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Word vs. Class-Based Word Sense Disambiguation

Authors: Ruben Izquierdo, Armando Suarez, German Rigau

JAIR 2015 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Second, we empirically demonstrate that our automatically derived semantic classes outperform classical approaches based on word senses and more coarse-grained sense groupings. Third, we also demonstrate that our supervised WSD system benefits from using these new semantic classes as additional semantic features while reducing the amount of training examples.
Researcher Affiliation Academia Ruben Izquierdo EMAIL VU University of Amsterdam Amsterdam. The Netherlands Armando Suarez EMAIL University of Alicante Alicante. Spain German Rigau EMAIL University of the Basque Country San Sebastian. Spain
Pseudocode Yes Algorithm 1 presents the pseudo code of the algorithm. Algorithm 1 BLC Extraction
Open Source Code No An implementation of this algorithm and the different sets of BLC used in this paper for several Word Net versions are freely available9. 9 http://adimen.si.ehu.es/web/BLC. While it states an implementation is available, the provided link leads to a project page with pre-generated datasets and a description, not directly to the source code repository for the algorithm itself.
Open Datasets Yes Sem Cor (Miller et al., 1993) is a subset of the Brown Corpus... Sens Eval-211 English all-words corpus (hereinafter SE2) (Palmer, Fellbaum, Cotton, Delfs, & Dang, 2001)... Sens Eval-312 English all-words corpus (hereinafter SE3) (Snyder & Palmer, 2004)... the Sem Eval-2 All words Word Sense Disambiguation on a Specific Domain task (Agirre et al., 2010)
Dataset Splits Yes Three semantic annotated corpora have been used for training and testing. Semcor for training, and Sens Eval-2 and Sens Eval-3 English all-words tasks, for testing... In these experiment, the Semcor files have been randomly selected and added to the training corpus in order to generate subsets of 5%, 10%, 15%, etc. of the training corpus19.
Hardware Specification No The paper describes experiments and a supervised WSD system but does not provide any specific details about the hardware used for running these experiments (e.g., GPU models, CPU types, or memory).
Software Dependencies No In our experiments, we used SVM-Light implementation (Joachims, 1998). We use Tree Tagger (Schmid, 1994) to preprocess the documents, performing Po S tagging and lemmatization. This mentions specific software but does not provide version numbers.
Experiment Setup Yes We have set this value to 0.01, which has been demonstrated as a good value for SVM in WSD tasks... We set this threshold t to 0.25, obtained empirically with very preliminary versions of the classifiers when applying a cross-validation setting on Sem Cor... Word-forms and lemmas in a window of 10 words around the target word. Po S, the concatenation of the preceding/following three and five Po S tags. Bigrams and trigrams formed by lemmas and word-forms in a window of 5 words around the target word