reproducibilityindex.ai

Unsupervised Text Style Transfer using Language Models as Discriminators

Authors: Zichao Yang, Zhiting Hu, Chris Dyer, Eric P. Xing, Taylor Berg-Kirkpatrick

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To demonstrate the effectiveness of our new approach, we conduct experiments on three tasks: word substitution decipherment, sentiment modiﬁcation, and related language translation. We compare our model with previous work that uses convolutional networks (CNNs) as discriminators, as well as a broad set of other approaches. Results show that the proposed method achieves improved performance on three tasks: word substitution decipherment, sentiment modiﬁcation, and related language translation.
Researcher Affiliation	Collaboration	Zichao Yang1, Zhiting Hu1, Chris Dyer2, Eric P. Xing1, Taylor Berg-Kirkpatrick1 1Carnegie Mellon University, 2Deep Mind {zichaoy, zhitingh, epxing, tberg}@cs.cmu.edu cdyer@google.com
Pseudocode	No	The paper contains diagrams and mathematical formulations but does not include structured pseudocode or algorithm blocks.
Open Source Code	No	The paper states, 'We implement our model with the Texar (Hu et al., 2018b) toolbox based on Tensorﬂow (Abadi et al., 2016).' and references using the code from 'https://github.com/shentianxiao/language-style-transfer' for comparison, but does not provide an explicit statement or link to their own open-source code for the methodology described in this paper.
Open Datasets	Yes	Following (Shen et al., 2017), we sample 200K sentences from the Yelp review dataset as plain text X and sample other 200K sentences and apply word substitution cipher on these sentences to get Y. We use the monolingual data from Leipzig Corpora Collections4. (http://wortschatz.uni-leipzig.de/en) For zh-CN and zh-TW pair, we use the monolingual data from the Chinese Gigaword corpus.
Dataset Splits	Yes	We use another 100k parallel sentences as the development and test set respectively. The data set contains 250K negative sentences (denoted as X) and 380K positive sentences (denoted as Y), of which 70% are used for training, 10% are used for development and the remaining 20% are used as test set. 80% are used for training, 10% are used for validation and remaining 10% are used for test.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper states, 'We implement our model with the Texar (Hu et al., 2018b) toolbox based on Tensorﬂow (Abadi et al., 2016)', but it does not specify version numbers for these software components.
Experiment Setup	Yes	Our model conﬁgurations are included in Appendix B. The encoder and decoder are both 2-layer LSTMs with 256 hidden dimensions. Word embeddings are 256 dimensions. The vocabulary size is 10k. We use Adam optimizer (Kingma and Ba, 2014) with a learning rate of 0.001. We train for 20 epochs with a batch size of 64.