DiffusER: Diffusion via Edit-based Reconstruction
Authors: Machel Reid, Vincent Josua Hellendoorn, Graham Neubig
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To demonstrate the effectiveness of DIFFUSER, we test our method on three text generation tasks: machine translation, abstractive summarization, and text style transfer, and show on-par or improved performance compared to purely autoregressive, single-pass and non-autoregressive methods. |
| Researcher Affiliation | Collaboration | Machel Reid Google Research machelreid@google.com Vincent J. Hellendoorn Software and Societal Systems Department Carnegie Mellon University vhellendoorn@cmu.edu Graham Neubig Language Technologies Institute, Carnegie Mellon University Inspired Cognition gneubig@cs.cmu.edu |
| Pseudocode | No | The paper describes the algorithms and processes in narrative text and through a figure, but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We use the WMT 14 English-German dataset for our machine translation experiments. ... We also benchmark on the CNN/Daily Mail dataset for summarization (Nallapati et al., 2016). ... We perform experiments using the Yelp (Shen et al., 2017) dataset for the unsupervised text-style transfer task. |
| Dataset Splits | Yes | We use the WMT 14 English-German dataset for our machine translation experiments. We use the same preprocessing and post-processing steps as Ghazvininejad et al. (2019). ... We also benchmark on the CNN/Daily Mail dataset for summarization (Nallapati et al., 2016). ... We use the Transformer-base encoder-decoder (Vaswani et al., 2017) architecture, with 6 layers, for the a hidden dimension of 512, feedforward dimension of 2048, 8 attention heads, and dropout p = 0.3. |
| Hardware Specification | Yes | Relative time (seconds) comparison between decoding methods, measured on a single V100 GPU. |
| Software Dependencies | No | The paper mentions using 'Transformer-base encoder-decoder (Vaswani et al., 2017) architecture' but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We use the Transformer-base encoder-decoder (Vaswani et al., 2017) architecture, with 6 layers, for the a hidden dimension of 512, feedforward dimension of 2048, 8 attention heads, and dropout p = 0.3. ... we use 12 diffusion steps, b = 5, and r = 3 for beam search, and Et(60% KEEP, 20% REPLACE, 10% IN S E R T, 10% DE L E T E) based on numbers from preliminary experiments. ... We use a Poisson distribution El(λ = 3) over edit operation lengths in our corruption process. ... We use a Poisson distribution El(λ = 8) over edit operation lengths in our corruption process. |