Unsupervised Text Style Transfer using Language Models as Discriminators
Authors: Zichao Yang, Zhiting Hu, Chris Dyer, Eric P. Xing, Taylor Berg-Kirkpatrick
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To demonstrate the effectiveness of our new approach, we conduct experiments on three tasks: word substitution decipherment, sentiment modification, and related language translation. We compare our model with previous work that uses convolutional networks (CNNs) as discriminators, as well as a broad set of other approaches. Results show that the proposed method achieves improved performance on three tasks: word substitution decipherment, sentiment modification, and related language translation. |
| Researcher Affiliation | Collaboration | Zichao Yang1, Zhiting Hu1, Chris Dyer2, Eric P. Xing1, Taylor Berg-Kirkpatrick1 1Carnegie Mellon University, 2Deep Mind {zichaoy, zhitingh, epxing, tberg}@cs.cmu.edu cdyer@google.com |
| Pseudocode | No | The paper contains diagrams and mathematical formulations but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states, 'We implement our model with the Texar (Hu et al., 2018b) toolbox based on Tensorflow (Abadi et al., 2016).' and references using the code from 'https://github.com/shentianxiao/language-style-transfer' for comparison, but does not provide an explicit statement or link to their own open-source code for the methodology described in this paper. |
| Open Datasets | Yes | Following (Shen et al., 2017), we sample 200K sentences from the Yelp review dataset as plain text X and sample other 200K sentences and apply word substitution cipher on these sentences to get Y. We use the monolingual data from Leipzig Corpora Collections4. (http://wortschatz.uni-leipzig.de/en) For zh-CN and zh-TW pair, we use the monolingual data from the Chinese Gigaword corpus. |
| Dataset Splits | Yes | We use another 100k parallel sentences as the development and test set respectively. The data set contains 250K negative sentences (denoted as X) and 380K positive sentences (denoted as Y), of which 70% are used for training, 10% are used for development and the remaining 20% are used as test set. 80% are used for training, 10% are used for validation and remaining 10% are used for test. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper states, 'We implement our model with the Texar (Hu et al., 2018b) toolbox based on Tensorflow (Abadi et al., 2016)', but it does not specify version numbers for these software components. |
| Experiment Setup | Yes | Our model configurations are included in Appendix B. The encoder and decoder are both 2-layer LSTMs with 256 hidden dimensions. Word embeddings are 256 dimensions. The vocabulary size is 10k. We use Adam optimizer (Kingma and Ba, 2014) with a learning rate of 0.001. We train for 20 epochs with a batch size of 64. |