Semi-Supervised Text Simplification with Back-Translation and Asymmetric Denoising Autoencoders

Authors: Yanbin Zhao, Lu Chen, Zhi Chen, Kai Yu9668-9675

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Automatic and human evaluations show that our unsupervised model outperforms the previous systems, and with limited supervision, our model can perform competitively with multiple state-of-the-art simplification systems.
Researcher Affiliation Academia Yanbin Zhao, Lu Chen, Zhi Chen, Kai Yu Mo E Key Lab of Artificial Intelligence Speech Lab, Department of Computer Science and Engineering Shanghai Jiao Tong University, Shanghai, China {zhaoyb, chenlusz, zhenchi713, kai.yu}@sjtu.edu.cn
Pseudocode Yes Algorithm 1: Our Simplification System
Open Source Code No The paper does not provide any explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets Yes We use the UNTS dataset (Surya et al. 2018) to train our unsupervised-model. ... For semi-supervised training and evaluation, we also use two parallel datasets: Wiki Large (Zhang and Lapata 2017) and Newsela dataset (Xu, Callison-Burch, and Napoles 2015).
Dataset Splits Yes Wiki Large comprise 359 test sentences, 2000 development sentences and 300k training sentences. Each source sentences in test set has 8 simplified references. Newsela... The first 1,070 articles are used for training, next 30 articles for development and others for testing.
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU types, or other detailed computer specifications used for running experiments.
Software Dependencies No The paper mentions software components like Transformer, Adam optimizer, byte-pair encoding, Fast Text, and LSTM language models, but does not provide specific version numbers for any of them.
Experiment Setup Yes Our model is built upon Transformer (Vaswani et al. 2017). Both encoder and decoders have 3 layers with 8 multiattention heads... The sub-word embeddings are 512dimensional vectors... In the training process, we use Adam optimizer (Kingma and Ba 2015); the first momentum was set to 0.5 and batch size to 16. For reinforcement training, we dynamically adjust the balance parameter γ. At the start of the training process, γ is set to zero... As training progresses, γ is gradually increased and finally con-verge to 0.9. We use the sigmoid function to perform this process. ...We pre-train the asymmetric denoising autoencoders for 200,000 steps with a learning rate of 1e-4, After that, we add back-translation training with a learning rate of 5e-5.