PLANNER: Generating Diversified Paragraph via Latent Language Diffusion Model

Authors: Yizhe Zhang, Jiatao Gu, Zhuofeng Wu, Shuangfei Zhai, Joshua Susskind, Navdeep Jaitly

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The proposed method is evaluated on various conditional generation tasks, and results on semantic generation, text completion and summarization show its effectiveness in generating high-quality long-form text in an efficient manner.
Researcher Affiliation Industry Yizhe Zhang, Jiatao Gu, Zhuofeng Wu, Shuangfei Zhai, Josh Susskind, Navdeep Jaitly Apple {yizzhang, jgu32, zhuofeng_wu, szhai, jsusskind, njaitly}@apple.com
Pseudocode No No pseudocode or algorithm blocks found.
Open Source Code No The paper does not provide an explicit statement or link for open-source code for the described methodology.
Open Datasets Yes For the Sentiment-guided generation task, we used the Trip Advisor dataset provided by (Li et al., 2014). For the text completion task, our model was assessed on two datasets: 1) the aforementioned Trip Advisor review dataset... and 2) one-tenth of the overall C4 datasets (Raffel et al., 2020)... For the summarization task, we use CNN/Daily Mail (Hermann et al., 2015) and XSum (Narayan et al., 2018).
Dataset Splits Yes The datasets were partitioned into training, validation, and test in the ratios of (0.96,0.02,0.02).
Hardware Specification Yes We conducted inference time benchmarks of each method on a single Nvidia A100.
Software Dependencies No The paper mentions software components like BERT-large, GPT-medium, T5-large, and PyTorch, but does not provide specific version numbers for these or other dependencies.
Experiment Setup Yes The embedding dimension h was 1024, and the number of paragraph embeddings k was set to 16, as increasing the number did not result in significant improvement in performance. We provide more analysis on the impact of k in App. A.2 The learning rate was set to 2e-4, and β was set to 5e-6. For the latent diffusion model, the channel size was set to 1024 to match the embedding dimension h, and the number of heads was set to 16 with 28 transformer layers. The total size of the latent diffusion model was 533M. The feature encoder was also jointly learned, and was initialized with a T5-large encoder. We use DDIM throughout our experiments as it shows better performance than DDPM. In all our experiments, we use 30 diffusion steps to generate the final z , which strikes a good balance among the efficiency, diversity and relevance. In comparison, Diff-LM (Li et al., 2022) and Genie (Lin et al., 2022) report to use 200 steps and 2000 steps respectively to generate high-quality text. We set the CFG weights to be 2 and 5 for text completion and summarization tasks, respectively, based on generation performance on validation set.