DiffusER: Diffusion via Edit-based Reconstruction

Authors: Machel Reid, Vincent Josua Hellendoorn, Graham Neubig

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To demonstrate the effectiveness of DIFFUSER, we test our method on three text generation tasks: machine translation, abstractive summarization, and text style transfer, and show on-par or improved performance compared to purely autoregressive, single-pass and non-autoregressive methods.
Researcher Affiliation Collaboration Machel Reid Google Research machelreid@google.com Vincent J. Hellendoorn Software and Societal Systems Department Carnegie Mellon University vhellendoorn@cmu.edu Graham Neubig Language Technologies Institute, Carnegie Mellon University Inspired Cognition gneubig@cs.cmu.edu
Pseudocode No The paper describes the algorithms and processes in narrative text and through a figure, but does not provide structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statement about releasing source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets Yes We use the WMT 14 English-German dataset for our machine translation experiments. ... We also benchmark on the CNN/Daily Mail dataset for summarization (Nallapati et al., 2016). ... We perform experiments using the Yelp (Shen et al., 2017) dataset for the unsupervised text-style transfer task.
Dataset Splits Yes We use the WMT 14 English-German dataset for our machine translation experiments. We use the same preprocessing and post-processing steps as Ghazvininejad et al. (2019). ... We also benchmark on the CNN/Daily Mail dataset for summarization (Nallapati et al., 2016). ... We use the Transformer-base encoder-decoder (Vaswani et al., 2017) architecture, with 6 layers, for the a hidden dimension of 512, feedforward dimension of 2048, 8 attention heads, and dropout p = 0.3.
Hardware Specification Yes Relative time (seconds) comparison between decoding methods, measured on a single V100 GPU.
Software Dependencies No The paper mentions using 'Transformer-base encoder-decoder (Vaswani et al., 2017) architecture' but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes We use the Transformer-base encoder-decoder (Vaswani et al., 2017) architecture, with 6 layers, for the a hidden dimension of 512, feedforward dimension of 2048, 8 attention heads, and dropout p = 0.3. ... we use 12 diffusion steps, b = 5, and r = 3 for beam search, and Et(60% KEEP, 20% REPLACE, 10% IN S E R T, 10% DE L E T E) based on numbers from preliminary experiments. ... We use a Poisson distribution El(λ = 3) over edit operation lengths in our corruption process. ... We use a Poisson distribution El(λ = 8) over edit operation lengths in our corruption process.