reproducibilityindex.ai

Analog Bits: Generating Discrete Data using Diffusion Models with Self-Conditioning

Authors: Ting Chen, Ruixiang ZHANG, Geoffrey Hinton

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experiment with two different discrete data generation tasks, namely discrete/categorical image generation, and image captioning (image-conditional text generation). and Table 1: Comparison of FIDs on unconditional and class-conditional CIFAR-10.
Researcher Affiliation	Industry	Ting Chen, Ruixiang Zhang , Geoffrey Hinton Google Research, Brain Team {iamtingchen,ruixiangz,geoffhinton}@google.com
Pseudocode	Yes	Algorithm 1 Bit Diffusion training algorithm. and Algorithm 2 Bit Diffusion sampling algorithm.
Open Source Code	Yes	Code at https://github.com/google-research/pix2seq.
Open Datasets	Yes	Datasets We use CIFAR-10 (Krizhevsky et al., 2009) and IMAGENET 64 64 (Deng et al., 2009) 2 for image generation experiments. For image captioning, following (Chen et al., 2022), we use MS-COCO 2017 captioning dataset (Lin et al., 2014).
Dataset Splits	No	Datasets We use CIFAR-10 (Krizhevsky et al., 2009) and IMAGENET 64 64 (Deng et al., 2009) 2 for image generation experiments. We adopt widely used FID (Heusel et al., 2017) as the main evaluation metric, and it is computed between 50K generated samples and the whole training set.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments are provided in the paper.
Software Dependencies	No	We use sentencepiece (Kudo & Richardson, 2018)... We train our models with the Adam optimizer (Kingma & Ba, 2014)... import tensorflow as tf.
Experiment Setup	Yes	For CIFAR-10, we train the model for 1.5M steps with a constant learning rate of 0.0001 and batch size of 128. For IMAGENET 64 64, we train the model for 500K steps with a constant learning rate of 0.0002 4 and batch size of 1024. For Bit Diffusion, we use Self-Conditioning by default, unless otherwise speciﬁed. We use an exponential moving average of the weights during training with a decay factor of 0.9999.