Text Diffusion with Reinforced Conditioning

Authors: Yuxuan Liu, Tianchi Yang, Shaohan Huang, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive experiments demonstrate the competitiveness of TREC against autoregressive, non-autoregressive, and diffusion baselines. Moreover, qualitative analysis shows its advanced ability to fully utilize the diffusion process in refining samples. We conduct a series of experiments on various tasks of NLG, including machine translation, paraphrasing, and question generation.
Researcher Affiliation Collaboration Yuxuan Liu1, Tianchi Yang2, Shaohan Huang2, Zihan Zhang2, Haizhen Huang2 Furu Wei2, Weiwei Deng2, Feng Sun2, Qi Zhang2 1Peking University 2Microsoft Corporation yx.liu@stu.pku.edu.cn
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any concrete access to source code for the methodology described, nor does it explicitly state that code is available.
Open Datasets Yes Specifically, we select tasks mainly following previous works (Gu et al. 2018; Ghazvininejad et al. 2019; Gong et al. 2023), including IWSLT14 De-En (Cettolo et al. 2014) and WMT14 En-De (Bojar et al. 2014) for translation, Quasar-T (Dhingra, Mazaitis, and Cohen 2017) for question generation, and Quora (QQP) (Chen et al. 2018) for paraphrase.
Dataset Splits No The paper mentions using datasets like IWSLT14, WMT14, Quasar-T, and QQP, and refers to a 'valid set' in Figure 1. However, it does not provide specific percentages or counts for training, validation, and test splits for these datasets within the text.
Hardware Specification Yes We train our model on 4 V100 GPUs.
Software Dependencies No The paper mentions tools like 'moses' for tokenization and 'Adam' as an optimizer, and refers to 'transformer-base' architecture. However, it does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, TensorFlow versions or specific library versions).
Experiment Setup Yes We adapt transformer-base (Vaswani et al. 2017) architecture of TREC (nlayers = 12, dmodel = 512, nheads = 8, d FFN = 2048), and set embedding dimension d = 128. For IWSLT, we reduce nheads and d FFN to 4 and 1024. We take 2000 diffusion steps during training, 20 during sampling, and apply a sqrt schedule (Li et al. 2022). For time-aware variance scaling, we pick k1 = 3 and k2 = 7.5e4 based on preliminary experiments. ... We apply a learning rate of 5e-4 (2e-4 for Quasar-T), 10K warmup steps (30K for Quasar-T), and apply the Adam (Kingma and Ba 2015) optimizer.