Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Word vs. Class-Based Word Sense Disambiguation
Authors: Ruben Izquierdo, Armando Suarez, German Rigau
JAIR 2015 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Second, we empirically demonstrate that our automatically derived semantic classes outperform classical approaches based on word senses and more coarse-grained sense groupings. Third, we also demonstrate that our supervised WSD system benefits from using these new semantic classes as additional semantic features while reducing the amount of training examples. |
| Researcher Affiliation | Academia | Ruben Izquierdo EMAIL VU University of Amsterdam Amsterdam. The Netherlands Armando Suarez EMAIL University of Alicante Alicante. Spain German Rigau EMAIL University of the Basque Country San Sebastian. Spain |
| Pseudocode | Yes | Algorithm 1 presents the pseudo code of the algorithm. Algorithm 1 BLC Extraction |
| Open Source Code | No | An implementation of this algorithm and the different sets of BLC used in this paper for several Word Net versions are freely available9. 9 http://adimen.si.ehu.es/web/BLC. While it states an implementation is available, the provided link leads to a project page with pre-generated datasets and a description, not directly to the source code repository for the algorithm itself. |
| Open Datasets | Yes | Sem Cor (Miller et al., 1993) is a subset of the Brown Corpus... Sens Eval-211 English all-words corpus (hereinafter SE2) (Palmer, Fellbaum, Cotton, Delfs, & Dang, 2001)... Sens Eval-312 English all-words corpus (hereinafter SE3) (Snyder & Palmer, 2004)... the Sem Eval-2 All words Word Sense Disambiguation on a Specific Domain task (Agirre et al., 2010) |
| Dataset Splits | Yes | Three semantic annotated corpora have been used for training and testing. Semcor for training, and Sens Eval-2 and Sens Eval-3 English all-words tasks, for testing... In these experiment, the Semcor files have been randomly selected and added to the training corpus in order to generate subsets of 5%, 10%, 15%, etc. of the training corpus19. |
| Hardware Specification | No | The paper describes experiments and a supervised WSD system but does not provide any specific details about the hardware used for running these experiments (e.g., GPU models, CPU types, or memory). |
| Software Dependencies | No | In our experiments, we used SVM-Light implementation (Joachims, 1998). We use Tree Tagger (Schmid, 1994) to preprocess the documents, performing Po S tagging and lemmatization. This mentions specific software but does not provide version numbers. |
| Experiment Setup | Yes | We have set this value to 0.01, which has been demonstrated as a good value for SVM in WSD tasks... We set this threshold t to 0.25, obtained empirically with very preliminary versions of the classifiers when applying a cross-validation setting on Sem Cor... Word-forms and lemmas in a window of 10 words around the target word. Po S, the concatenation of the preceding/following three and five Po S tags. Bigrams and trigrams formed by lemmas and word-forms in a window of 5 words around the target word |