Analog Bits: Generating Discrete Data using Diffusion Models with Self-Conditioning
Authors: Ting Chen, Ruixiang ZHANG, Geoffrey Hinton
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experiment with two different discrete data generation tasks, namely discrete/categorical image generation, and image captioning (image-conditional text generation). and Table 1: Comparison of FIDs on unconditional and class-conditional CIFAR-10. |
| Researcher Affiliation | Industry | Ting Chen, Ruixiang Zhang , Geoffrey Hinton Google Research, Brain Team {iamtingchen,ruixiangz,geoffhinton}@google.com |
| Pseudocode | Yes | Algorithm 1 Bit Diffusion training algorithm. and Algorithm 2 Bit Diffusion sampling algorithm. |
| Open Source Code | Yes | Code at https://github.com/google-research/pix2seq. |
| Open Datasets | Yes | Datasets We use CIFAR-10 (Krizhevsky et al., 2009) and IMAGENET 64 64 (Deng et al., 2009) 2 for image generation experiments. For image captioning, following (Chen et al., 2022), we use MS-COCO 2017 captioning dataset (Lin et al., 2014). |
| Dataset Splits | No | Datasets We use CIFAR-10 (Krizhevsky et al., 2009) and IMAGENET 64 64 (Deng et al., 2009) 2 for image generation experiments. We adopt widely used FID (Heusel et al., 2017) as the main evaluation metric, and it is computed between 50K generated samples and the whole training set. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments are provided in the paper. |
| Software Dependencies | No | We use sentencepiece (Kudo & Richardson, 2018)... We train our models with the Adam optimizer (Kingma & Ba, 2014)... import tensorflow as tf. |
| Experiment Setup | Yes | For CIFAR-10, we train the model for 1.5M steps with a constant learning rate of 0.0001 and batch size of 128. For IMAGENET 64 64, we train the model for 500K steps with a constant learning rate of 0.0002 4 and batch size of 1024. For Bit Diffusion, we use Self-Conditioning by default, unless otherwise speciļ¬ed. We use an exponential moving average of the weights during training with a decay factor of 0.9999. |