Semi-Supervised Text Simplification with Back-Translation and Asymmetric Denoising Autoencoders
Authors: Yanbin Zhao, Lu Chen, Zhi Chen, Kai Yu9668-9675
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Automatic and human evaluations show that our unsupervised model outperforms the previous systems, and with limited supervision, our model can perform competitively with multiple state-of-the-art simplification systems. |
| Researcher Affiliation | Academia | Yanbin Zhao, Lu Chen, Zhi Chen, Kai Yu Mo E Key Lab of Artificial Intelligence Speech Lab, Department of Computer Science and Engineering Shanghai Jiao Tong University, Shanghai, China {zhaoyb, chenlusz, zhenchi713, kai.yu}@sjtu.edu.cn |
| Pseudocode | Yes | Algorithm 1: Our Simplification System |
| Open Source Code | No | The paper does not provide any explicit statement about releasing source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | We use the UNTS dataset (Surya et al. 2018) to train our unsupervised-model. ... For semi-supervised training and evaluation, we also use two parallel datasets: Wiki Large (Zhang and Lapata 2017) and Newsela dataset (Xu, Callison-Burch, and Napoles 2015). |
| Dataset Splits | Yes | Wiki Large comprise 359 test sentences, 2000 development sentences and 300k training sentences. Each source sentences in test set has 8 simplified references. Newsela... The first 1,070 articles are used for training, next 30 articles for development and others for testing. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or other detailed computer specifications used for running experiments. |
| Software Dependencies | No | The paper mentions software components like Transformer, Adam optimizer, byte-pair encoding, Fast Text, and LSTM language models, but does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | Our model is built upon Transformer (Vaswani et al. 2017). Both encoder and decoders have 3 layers with 8 multiattention heads... The sub-word embeddings are 512dimensional vectors... In the training process, we use Adam optimizer (Kingma and Ba 2015); the first momentum was set to 0.5 and batch size to 16. For reinforcement training, we dynamically adjust the balance parameter γ. At the start of the training process, γ is set to zero... As training progresses, γ is gradually increased and finally con-verge to 0.9. We use the sigmoid function to perform this process. ...We pre-train the asymmetric denoising autoencoders for 200,000 steps with a learning rate of 1e-4, After that, we add back-translation training with a learning rate of 5e-5. |