reproducibilityindex.ai

Unsupervised Lexical Simplification for Non-Native Speakers

Authors: Gustavo Paetzold, Lucia Specia

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In the following Sections, we describe each of the experiments conducted with our LS approach, which we refer henceforth to as LS-NNS. All other approaches hereon mentioned were replicated to the best of our ability.
Researcher Affiliation	Academia	Gustavo H. Paetzold and Lucia Specia University of Shefﬁeld Western Bank, South Yorkshire S10 2TN Shefﬁeld, United Kingdom
Pseudocode	No	The paper describes algorithms and methods in prose (e.g., 'Candidate Generation Algorithm', 'Boundary Ranking'), but does not include structured pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	All methods and resources used in this paper are available in the LEXenstein framework8. 8http://ghpaetzold.github.io/LEXenstein/
Open Datasets	Yes	We also introduce a new domain-speciﬁc dataset for the task, which accounts for the simpliﬁcation needs of non-native English speakers." and "The resulting dataset, which we refer to as NNSeval, contains 239 instances." and "The evaluation dataset used is the one provided for the English Lexical Simpliﬁcation task of Sem Eval 2012 (Specia, Jauhar, and Mihalcea 2012)
Dataset Splits	No	The paper uses the NNSeval dataset for evaluation and mentions '10-fold cross validation' for learning the decision boundary in Substitution Selection. However, it does not provide explicit training, validation, and test dataset splits (percentages or counts) for the overall experiments or a fixed partitioning of the NNSeval dataset.
Hardware Specification	No	The paper mentions training models on large corpora (e.g., 7 billion words for word embeddings) and using tools like word2vec and Stanford Parser, but does not specify any hardware details such as GPU models, CPU types, or memory used for these operations.
Software Dependencies	No	The paper mentions several software tools and libraries used (e.g., 'word2vec toolkit', 'Stanford Parser', 'SRILM'), but it does not specify their exact version numbers, which is necessary for reproducible software dependencies.
Experiment Setup	Yes	For training, we use the bag-of-words model (CBOW), and 500 dimensions for the embedding vectors." and "We learn the decision boundary through Stochastic Gradient Descent with 10-fold cross validation." and "For Substitution Ranking we use 5-gram probabilities with two tokens to the left and right of the candidate.