Discrete Modeling via Boundary Conditional Diffusion Processes
Authors: Yuxuan Gu, Xiaocheng Feng, Lei Huang, Yingsheng Wu, Zekun Zhou, Weihong Zhong, kun Zhu, Bing Qin
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results indicate that our approach achieves strong performance in both language modeling and discrete image generation tasks. In language modeling, our approach surpasses previous state-of-the-art continuous diffusion language models in three translation tasks and a summarization task, while also demonstrating competitive performance compared to auto-regressive transformers. Moreover, our method achieves comparable results to continuous diffusion models when using discrete ordinal pixels and establishes a new state-of-the-art for categorical image generation on the CIFAR-10 dataset. |
| Researcher Affiliation | Collaboration | Harbin Institute of Technology Peng Cheng Laboratory {yxgu,xcfeng,lhuang,yswu,zkzhou,whzhong,kzhu,qinb}@ir.hit.edu.cn |
| Pseudocode | Yes | Algorithm 1 Training; Algorithm 2 Sampling; Algorithm 3 Gaussian Sampling |
| Open Source Code | Yes | Our framework is a module constructed on current diffusion models. We demonstrate our kernel part rescale diffusion trajectory with pseudo python code as below: ... and we will public our code on github.com. |
| Open Datasets | Yes | Our approach is experimented in both language modeling and discrete image generation. On three machine translation datasets (IWSLT14 DE-EN [Cettolo et al., 2012], WMT14 EN-DE, WMT16 EN-RO) and a text summarization dataset (GIGAWORD [Rush et al., 2015]) for language modeling... For image generation on CIFAR-10 [Krizhevsky et al., 2009]... |
| Dataset Splits | Yes | Datasets used for experiments include three translation tasks (IWSLT14 DE-EN [Cettolo et al., 2012], WMT14 EN-DE, and WMT16 EN-RO1) and one text summarization task (GIGAWORD [Rush et al., 2015]) for language modeling, our proposed approach... We use CIFAR-10 [Krizhevsky et al., 2009] for discrete image generation. |
| Hardware Specification | Yes | Our experiments are performed with Nvidia 80G A100. Each language result requires about 2 days on one single A100. Each image result requires about a week on one single A100. |
| Software Dependencies | No | The paper mentions 'FAIRSEQ framework' but does not specify its version or the versions of other software dependencies like Python, PyTorch, etc. |
| Experiment Setup | Yes | During training, the diffusion step is T = 2000 and the confidence factor r = 1 for translation tasks since they have strong conditions, while r = 0.5 for summarization. Sentences are generated deterministically with 20 steps. ... The model is trained for 1.5M steps with the learning rate of 1e94 and batch size of 128. |