Unsupervised Text Generation by Learning from Search

Authors: Jingjing Li, Zichao Li, Lili Mou, Xin Jiang, Michael Lyu, Irwin King

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of TGLS on two real-world natural language generation tasks, unsupervised paraphrasing and text formalization. Our model significantly outperforms unsupervised baseline methods in both tasks.
Researcher Affiliation Collaboration 1The Chinese University of Hong Kong 2Huawei Noah s Ark Lab 3University of Alberta; Alberta Machine Intelligence Institute (Amii)
Pseudocode Yes Algorithm 1: Training TGLS
Open Source Code Yes 1Code is available at https://github.com/jingjingli01/TGLS
Open Datasets Yes we conducted experiments on the Quora benchmark dataset.3
Dataset Splits Yes For validation and testing, we had 500 and 170K samples, respectively.
Hardware Specification Yes The experiments were conducted on a cluster with Nvidia Telsa V100 GPUs.
Software Dependencies No The paper mentions software components like GPT2 and RoBERTa, but does not specify version numbers for these or other libraries and programming languages used to implement the experiments.
Experiment Setup Yes For SA, the initial temperature was set to 1e-2 in both tasks. The total search steps and temperature cooling were 50, 2e-4 for paraphrasing; and 100 and 1e-4 for text simplification. The scorers weights were tuned by grid search, set as ( , β, γ, δ) = (0.8, 1, 0.6, 0.125) for paraphrasing, and (0.8, 2, 1.25, 0.26) for text formalization. We keep the Ro BERTa fixed and further tune the GPT2 model by alternations of search and learning for another 6 epochs.