Non-Parallel Text Style Transfer with Self-Parallel Supervision

Authors: Ruibo Liu, Chongyang Gao, Chenyan Jia, Guangxuan Xu, Soroush Vosoughi

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 3 EXPERIMENTS
Researcher Affiliation Academia Dartmouth College, Northwestern University, University of Texas, Austin University of California, Los Angeles
Pseudocode Yes Algorithm 1: Sentence Distillation for Political Stance Dataset
Open Source Code Yes Code for La Mer is available at https://github.com/Dapang Liu/La Mer.
Open Datasets Yes Sentiment Transfer. We use the Yelp reviews dataset collected by Shen et al. (2017) which contains 250k negative sentences and 380k positive sentences, organized in non-parallel fashion. Formality Transfer. A more challenging TST task is to modify the formality of a given sentence. We use the GYAFC dataset (Rao & Tetreault, 2018), which contains formal and informal sentences from two domains.
Dataset Splits Yes Formality Transfer... which consists of about 52k training sentences, 5k development sentences, and 2.5k test sentences.
Hardware Specification Yes All of our experiments were run on a single RTX-2080 GPU, with batch size 4 and 2/3/2 epochs for La Mer in the above three TST tasks.
Software Dependencies No The paper mentions using pre-trained models like Ro BERTa and BART, citing their original papers, but does not provide specific version numbers for software dependencies or libraries (e.g., PyTorch, TensorFlow, HuggingFace transformers version).
Experiment Setup Yes All of our experiments were run on a single RTX-2080 GPU, with batch size 4 and 2/3/2 epochs for La Mer in the above three TST tasks. We choose the REINFORCE algorithm (Williams, 1992) to optimize the current policy πθ. Empirically we set Jsafe IL to {0.8, 0.6, 0.4} for the three TST tasks (sentiment, formality, and political stance). α controls the weights assigned to d Order and d Exist; set by running repeated experiments ranging the α from 0 to 1 by 0.1, and picking the best-performing α with respect to GM: α = {0.4, 0.3, 0.1} for the three tasks. The filtering parameter p and k are hyperparameters that are crucial for the construction of roughly parallel datasets.