Unsupervised Lexical Simplification for Non-Native Speakers
Authors: Gustavo Paetzold, Lucia Specia
AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In the following Sections, we describe each of the experiments conducted with our LS approach, which we refer henceforth to as LS-NNS. All other approaches hereon mentioned were replicated to the best of our ability. |
| Researcher Affiliation | Academia | Gustavo H. Paetzold and Lucia Specia University of Sheffield Western Bank, South Yorkshire S10 2TN Sheffield, United Kingdom |
| Pseudocode | No | The paper describes algorithms and methods in prose (e.g., 'Candidate Generation Algorithm', 'Boundary Ranking'), but does not include structured pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | All methods and resources used in this paper are available in the LEXenstein framework8. 8http://ghpaetzold.github.io/LEXenstein/ |
| Open Datasets | Yes | We also introduce a new domain-specific dataset for the task, which accounts for the simplification needs of non-native English speakers." and "The resulting dataset, which we refer to as NNSeval, contains 239 instances." and "The evaluation dataset used is the one provided for the English Lexical Simplification task of Sem Eval 2012 (Specia, Jauhar, and Mihalcea 2012) |
| Dataset Splits | No | The paper uses the NNSeval dataset for evaluation and mentions '10-fold cross validation' for learning the decision boundary in Substitution Selection. However, it does not provide explicit training, validation, and test dataset splits (percentages or counts) for the overall experiments or a fixed partitioning of the NNSeval dataset. |
| Hardware Specification | No | The paper mentions training models on large corpora (e.g., 7 billion words for word embeddings) and using tools like word2vec and Stanford Parser, but does not specify any hardware details such as GPU models, CPU types, or memory used for these operations. |
| Software Dependencies | No | The paper mentions several software tools and libraries used (e.g., 'word2vec toolkit', 'Stanford Parser', 'SRILM'), but it does not specify their exact version numbers, which is necessary for reproducible software dependencies. |
| Experiment Setup | Yes | For training, we use the bag-of-words model (CBOW), and 500 dimensions for the embedding vectors." and "We learn the decision boundary through Stochastic Gradient Descent with 10-fold cross validation." and "For Substitution Ranking we use 5-gram probabilities with two tokens to the left and right of the candidate. |