Analog Bits: Generating Discrete Data using Diffusion Models with Self-Conditioning

Authors: Ting Chen, Ruixiang ZHANG, Geoffrey Hinton

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experiment with two different discrete data generation tasks, namely discrete/categorical image generation, and image captioning (image-conditional text generation). and Table 1: Comparison of FIDs on unconditional and class-conditional CIFAR-10.
Researcher Affiliation Industry Ting Chen, Ruixiang Zhang , Geoffrey Hinton Google Research, Brain Team {iamtingchen,ruixiangz,geoffhinton}@google.com
Pseudocode Yes Algorithm 1 Bit Diffusion training algorithm. and Algorithm 2 Bit Diffusion sampling algorithm.
Open Source Code Yes Code at https://github.com/google-research/pix2seq.
Open Datasets Yes Datasets We use CIFAR-10 (Krizhevsky et al., 2009) and IMAGENET 64 64 (Deng et al., 2009) 2 for image generation experiments. For image captioning, following (Chen et al., 2022), we use MS-COCO 2017 captioning dataset (Lin et al., 2014).
Dataset Splits No Datasets We use CIFAR-10 (Krizhevsky et al., 2009) and IMAGENET 64 64 (Deng et al., 2009) 2 for image generation experiments. We adopt widely used FID (Heusel et al., 2017) as the main evaluation metric, and it is computed between 50K generated samples and the whole training set.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments are provided in the paper.
Software Dependencies No We use sentencepiece (Kudo & Richardson, 2018)... We train our models with the Adam optimizer (Kingma & Ba, 2014)... import tensorflow as tf.
Experiment Setup Yes For CIFAR-10, we train the model for 1.5M steps with a constant learning rate of 0.0001 and batch size of 128. For IMAGENET 64 64, we train the model for 500K steps with a constant learning rate of 0.0002 4 and batch size of 1024. For Bit Diffusion, we use Self-Conditioning by default, unless otherwise specified. We use an exponential moving average of the weights during training with a decay factor of 0.9999.