Discrete Modeling via Boundary Conditional Diffusion Processes

Authors: Yuxuan Gu, Xiaocheng Feng, Lei Huang, Yingsheng Wu, Zekun Zhou, Weihong Zhong, kun Zhu, Bing Qin

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results indicate that our approach achieves strong performance in both language modeling and discrete image generation tasks. In language modeling, our approach surpasses previous state-of-the-art continuous diffusion language models in three translation tasks and a summarization task, while also demonstrating competitive performance compared to auto-regressive transformers. Moreover, our method achieves comparable results to continuous diffusion models when using discrete ordinal pixels and establishes a new state-of-the-art for categorical image generation on the CIFAR-10 dataset.
Researcher Affiliation Collaboration Harbin Institute of Technology Peng Cheng Laboratory {yxgu,xcfeng,lhuang,yswu,zkzhou,whzhong,kzhu,qinb}@ir.hit.edu.cn
Pseudocode Yes Algorithm 1 Training; Algorithm 2 Sampling; Algorithm 3 Gaussian Sampling
Open Source Code Yes Our framework is a module constructed on current diffusion models. We demonstrate our kernel part rescale diffusion trajectory with pseudo python code as below: ... and we will public our code on github.com.
Open Datasets Yes Our approach is experimented in both language modeling and discrete image generation. On three machine translation datasets (IWSLT14 DE-EN [Cettolo et al., 2012], WMT14 EN-DE, WMT16 EN-RO) and a text summarization dataset (GIGAWORD [Rush et al., 2015]) for language modeling... For image generation on CIFAR-10 [Krizhevsky et al., 2009]...
Dataset Splits Yes Datasets used for experiments include three translation tasks (IWSLT14 DE-EN [Cettolo et al., 2012], WMT14 EN-DE, and WMT16 EN-RO1) and one text summarization task (GIGAWORD [Rush et al., 2015]) for language modeling, our proposed approach... We use CIFAR-10 [Krizhevsky et al., 2009] for discrete image generation.
Hardware Specification Yes Our experiments are performed with Nvidia 80G A100. Each language result requires about 2 days on one single A100. Each image result requires about a week on one single A100.
Software Dependencies No The paper mentions 'FAIRSEQ framework' but does not specify its version or the versions of other software dependencies like Python, PyTorch, etc.
Experiment Setup Yes During training, the diffusion step is T = 2000 and the confidence factor r = 1 for translation tasks since they have strong conditions, while r = 0.5 for summarization. Sentences are generated deterministically with 20 steps. ... The model is trained for 1.5M steps with the learning rate of 1e94 and batch size of 128.