Lexical Simplification with Pretrained Encoders
Authors: Jipeng Qiang, Yun Li, Yi Zhu, Yunhao Yuan, Xindong Wu8649-8656
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | experimental results show that our approach obtains obvious improvement compared with these baselines leveraging linguistic databases and parallel corpus, outperforming the state-of-the-art by more than 12 Accuracy points on three well-known benchmarks. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science, Yangzhou University, Jiangsu, China 2Key Laboratory of Knowledge Engineering with Big Data (Hefei University of Technology), Ministry of Education, Anhui, China 3Mininglamp Academy of Sciences, Minininglamp Technology, Beijing, China |
| Pseudocode | Yes | Algorithm 1 Simplify(sentence S, Complex word w) 1: Replace word w of S into [MASK] as S 2: Concatenate S and S using [CLS] and [SEP] 3: p( |S, S \{w}) BERT(S, S ) 4: scs top probability(p( |S, S \{w})) 5: all ranks 6: for each feature f do 7: scores 8: for each sc scs do 9: scores scores f(sc) 10: end for 11: rank rank numbers(scores) 12: all ranks all ranks rank 13: end for 14: avg rank average(all ranks) 15: best argmaxsc(avg rank) 16: return best |
| Open Source Code | Yes | The code to reproduce our results is available at https://github.com/anonymous. |
| Open Datasets | Yes | We use three widely used lexical simplification datasets to do experiments. (1) Lex MTurk5 (Horn, Manduca, and Kauchak 2014). (2) Bench LS 6 (Paetzold and Specia 2016). (3) NNSeval 7 (Paetzold and Specia 2017b). Links are provided in footnotes: 5http://www.cs.pomona.edu/ dkauchak/simplification/lex.mturk.14, 6http://ghpaetzold.github.io/data/Bench LS.zip, 7http://ghpaetzold.github.io/data/NNSeval.zip |
| Dataset Splits | No | The paper uses Lex MTurk, Bench LS, and NNSeval datasets but does not explicitly provide specific train/validation/test split percentages, sample counts, or explicit methodology for partitioning the data. It refers to 'three widely used lexical simplification datasets' but does not detail how they were split for the experiments. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running experiments are provided in the paper. |
| Software Dependencies | No | The paper mentions using 'BERT-Large, Uncased (Whole Word Masking)' and 'fastText' but does not provide specific version numbers for these or any other software dependencies needed for replication. |
| Experiment Setup | No | The paper lacks specific experimental setup details such as concrete hyperparameter values (e.g., learning rate, batch size, epochs, optimizer settings), beyond mentioning the 'number of simplification candidates ranges from 1 to 15'. |