Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Text Diffusion with Reinforced Conditioning
Authors: Yuxuan Liu, Tianchi Yang, Shaohan Huang, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang
AAAI 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experiments demonstrate the competitiveness of TREC against autoregressive, non-autoregressive, and diffusion baselines. Moreover, qualitative analysis shows its advanced ability to fully utilize the diffusion process in refining samples. We conduct a series of experiments on various tasks of NLG, including machine translation, paraphrasing, and question generation. |
| Researcher Affiliation | Collaboration | Yuxuan Liu1, Tianchi Yang2, Shaohan Huang2, Zihan Zhang2, Haizhen Huang2 Furu Wei2, Weiwei Deng2, Feng Sun2, Qi Zhang2 1Peking University 2Microsoft Corporation EMAIL |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access to source code for the methodology described, nor does it explicitly state that code is available. |
| Open Datasets | Yes | Specifically, we select tasks mainly following previous works (Gu et al. 2018; Ghazvininejad et al. 2019; Gong et al. 2023), including IWSLT14 De-En (Cettolo et al. 2014) and WMT14 En-De (Bojar et al. 2014) for translation, Quasar-T (Dhingra, Mazaitis, and Cohen 2017) for question generation, and Quora (QQP) (Chen et al. 2018) for paraphrase. |
| Dataset Splits | No | The paper mentions using datasets like IWSLT14, WMT14, Quasar-T, and QQP, and refers to a 'valid set' in Figure 1. However, it does not provide specific percentages or counts for training, validation, and test splits for these datasets within the text. |
| Hardware Specification | Yes | We train our model on 4 V100 GPUs. |
| Software Dependencies | No | The paper mentions tools like 'moses' for tokenization and 'Adam' as an optimizer, and refers to 'transformer-base' architecture. However, it does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, TensorFlow versions or specific library versions). |
| Experiment Setup | Yes | We adapt transformer-base (Vaswani et al. 2017) architecture of TREC (nlayers = 12, dmodel = 512, nheads = 8, d FFN = 2048), and set embedding dimension d = 128. For IWSLT, we reduce nheads and d FFN to 4 and 1024. We take 2000 diffusion steps during training, 20 during sampling, and apply a sqrt schedule (Li et al. 2022). For time-aware variance scaling, we pick k1 = 3 and k2 = 7.5e4 based on preliminary experiments. ... We apply a learning rate of 5e-4 (2e-4 for Quasar-T), 10K warmup steps (30K for Quasar-T), and apply the Adam (Kingma and Ba 2015) optimizer. |