Lexical Simplification with Pretrained Encoders

Authors: Jipeng Qiang, Yun Li, Yi Zhu, Yunhao Yuan, Xindong Wu8649-8656

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental experimental results show that our approach obtains obvious improvement compared with these baselines leveraging linguistic databases and parallel corpus, outperforming the state-of-the-art by more than 12 Accuracy points on three well-known benchmarks.
Researcher Affiliation Collaboration 1Department of Computer Science, Yangzhou University, Jiangsu, China 2Key Laboratory of Knowledge Engineering with Big Data (Hefei University of Technology), Ministry of Education, Anhui, China 3Mininglamp Academy of Sciences, Minininglamp Technology, Beijing, China
Pseudocode Yes Algorithm 1 Simplify(sentence S, Complex word w) 1: Replace word w of S into [MASK] as S 2: Concatenate S and S using [CLS] and [SEP] 3: p( |S, S \{w}) BERT(S, S ) 4: scs top probability(p( |S, S \{w})) 5: all ranks 6: for each feature f do 7: scores 8: for each sc scs do 9: scores scores f(sc) 10: end for 11: rank rank numbers(scores) 12: all ranks all ranks rank 13: end for 14: avg rank average(all ranks) 15: best argmaxsc(avg rank) 16: return best
Open Source Code Yes The code to reproduce our results is available at https://github.com/anonymous.
Open Datasets Yes We use three widely used lexical simplification datasets to do experiments. (1) Lex MTurk5 (Horn, Manduca, and Kauchak 2014). (2) Bench LS 6 (Paetzold and Specia 2016). (3) NNSeval 7 (Paetzold and Specia 2017b). Links are provided in footnotes: 5http://www.cs.pomona.edu/ dkauchak/simplification/lex.mturk.14, 6http://ghpaetzold.github.io/data/Bench LS.zip, 7http://ghpaetzold.github.io/data/NNSeval.zip
Dataset Splits No The paper uses Lex MTurk, Bench LS, and NNSeval datasets but does not explicitly provide specific train/validation/test split percentages, sample counts, or explicit methodology for partitioning the data. It refers to 'three widely used lexical simplification datasets' but does not detail how they were split for the experiments.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running experiments are provided in the paper.
Software Dependencies No The paper mentions using 'BERT-Large, Uncased (Whole Word Masking)' and 'fastText' but does not provide specific version numbers for these or any other software dependencies needed for replication.
Experiment Setup No The paper lacks specific experimental setup details such as concrete hyperparameter values (e.g., learning rate, batch size, epochs, optimizer settings), beyond mentioning the 'number of simplification candidates ranges from 1 to 15'.