Text Diffusion with Reinforced Conditioning
Authors: Yuxuan Liu, Tianchi Yang, Shaohan Huang, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experiments demonstrate the competitiveness of TREC against autoregressive, non-autoregressive, and diffusion baselines. Moreover, qualitative analysis shows its advanced ability to fully utilize the diffusion process in refining samples. We conduct a series of experiments on various tasks of NLG, including machine translation, paraphrasing, and question generation. |
| Researcher Affiliation | Collaboration | Yuxuan Liu1, Tianchi Yang2, Shaohan Huang2, Zihan Zhang2, Haizhen Huang2 Furu Wei2, Weiwei Deng2, Feng Sun2, Qi Zhang2 1Peking University 2Microsoft Corporation yx.liu@stu.pku.edu.cn |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access to source code for the methodology described, nor does it explicitly state that code is available. |
| Open Datasets | Yes | Specifically, we select tasks mainly following previous works (Gu et al. 2018; Ghazvininejad et al. 2019; Gong et al. 2023), including IWSLT14 De-En (Cettolo et al. 2014) and WMT14 En-De (Bojar et al. 2014) for translation, Quasar-T (Dhingra, Mazaitis, and Cohen 2017) for question generation, and Quora (QQP) (Chen et al. 2018) for paraphrase. |
| Dataset Splits | No | The paper mentions using datasets like IWSLT14, WMT14, Quasar-T, and QQP, and refers to a 'valid set' in Figure 1. However, it does not provide specific percentages or counts for training, validation, and test splits for these datasets within the text. |
| Hardware Specification | Yes | We train our model on 4 V100 GPUs. |
| Software Dependencies | No | The paper mentions tools like 'moses' for tokenization and 'Adam' as an optimizer, and refers to 'transformer-base' architecture. However, it does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, TensorFlow versions or specific library versions). |
| Experiment Setup | Yes | We adapt transformer-base (Vaswani et al. 2017) architecture of TREC (nlayers = 12, dmodel = 512, nheads = 8, d FFN = 2048), and set embedding dimension d = 128. For IWSLT, we reduce nheads and d FFN to 4 and 1024. We take 2000 diffusion steps during training, 20 during sampling, and apply a sqrt schedule (Li et al. 2022). For time-aware variance scaling, we pick k1 = 3 and k2 = 7.5e4 based on preliminary experiments. ... We apply a learning rate of 5e-4 (2e-4 for Quasar-T), 10K warmup steps (30K for Quasar-T), and apply the Adam (Kingma and Ba 2015) optimizer. |