reproducibilityindex.ai

Lexical Simplification with Pretrained Encoders

Authors: Jipeng Qiang, Yun Li, Yi Zhu, Yunhao Yuan, Xindong Wu8649-8656

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	experimental results show that our approach obtains obvious improvement compared with these baselines leveraging linguistic databases and parallel corpus, outperforming the state-of-the-art by more than 12 Accuracy points on three well-known benchmarks.
Researcher Affiliation	Collaboration	1Department of Computer Science, Yangzhou University, Jiangsu, China 2Key Laboratory of Knowledge Engineering with Big Data (Hefei University of Technology), Ministry of Education, Anhui, China 3Mininglamp Academy of Sciences, Minininglamp Technology, Beijing, China
Pseudocode	Yes	Algorithm 1 Simplify(sentence S, Complex word w) 1: Replace word w of S into [MASK] as S 2: Concatenate S and S using [CLS] and [SEP] 3: p( \|S, S \{w}) BERT(S, S ) 4: scs top probability(p( \|S, S \{w})) 5: all ranks 6: for each feature f do 7: scores 8: for each sc scs do 9: scores scores f(sc) 10: end for 11: rank rank numbers(scores) 12: all ranks all ranks rank 13: end for 14: avg rank average(all ranks) 15: best argmaxsc(avg rank) 16: return best
Open Source Code	Yes	The code to reproduce our results is available at https://github.com/anonymous.
Open Datasets	Yes	We use three widely used lexical simpliﬁcation datasets to do experiments. (1) Lex MTurk5 (Horn, Manduca, and Kauchak 2014). (2) Bench LS 6 (Paetzold and Specia 2016). (3) NNSeval 7 (Paetzold and Specia 2017b). Links are provided in footnotes: 5http://www.cs.pomona.edu/ dkauchak/simpliﬁcation/lex.mturk.14, 6http://ghpaetzold.github.io/data/Bench LS.zip, 7http://ghpaetzold.github.io/data/NNSeval.zip
Dataset Splits	No	The paper uses Lex MTurk, Bench LS, and NNSeval datasets but does not explicitly provide specific train/validation/test split percentages, sample counts, or explicit methodology for partitioning the data. It refers to 'three widely used lexical simpliﬁcation datasets' but does not detail how they were split for the experiments.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running experiments are provided in the paper.
Software Dependencies	No	The paper mentions using 'BERT-Large, Uncased (Whole Word Masking)' and 'fastText' but does not provide specific version numbers for these or any other software dependencies needed for replication.
Experiment Setup	No	The paper lacks specific experimental setup details such as concrete hyperparameter values (e.g., learning rate, batch size, epochs, optimizer settings), beyond mentioning the 'number of simpliﬁcation candidates ranges from 1 to 15'.