reproducibilityindex.ai

Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling

Authors: Jiatao Gu, Ying Shen, Shuangfei Zhai, Yizhe Zhang, Navdeep Jaitly, Joshua Susskind

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experiment on both class and text conditioned image generation benchmarks 1. We show that Kaleido not only outperforms standard diffusion models in terms of diversity but also maintains the high quality of the generated image. Additionally, the generated latents effectively control the characteristics of the generated images, ensuring that the image samples closely align with the intended latent variables. This modeling of latent tokens not only increases the diversity of image outputs but also provides a degree of interpretability and control over the image generation process. 4 Experiments
Researcher Affiliation	Collaboration	Apple University of Illinois Urbana-Champaign equal contribution {jgu32, szhai, yizzhang,njaitly, jsusskind}@apple.com ying22@illinois.edu
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	Code will be released after acceptance and internal review.
Open Datasets	Yes	For the former, we use Image Net [Deng et al., 2009], and we learn the textto-image models on CC12M [Changpinyo et al., 2021]
Dataset Splits	No	The paper does not explicitly specify training, validation, and test dataset splits needed to reproduce the experiment. It mentions evaluating metrics with '50K samples against the full training set' and using '10K samples' for diversity assessment, but does not detail how the data was split for model training and validation.
Hardware Specification	Yes	All experiments are performed on 64 A100 GPUs which takes roughly 2 weeks for training 400k steps for both Image Net and CC12M datasets.
Software Dependencies	No	The paper mentions using specific models like T5-XL and Qwen-VL-Chat, and frameworks like DDPM, but it does not specify the version numbers of general software dependencies such as Python, PyTorch, or CUDA.
Experiment Setup	Yes	default training config: batch_size =512 num_updates =400 _000 optimizer= adam adam_beta1 =0.9 adam_beta2 =0.99 adam_eps =1.e-8 learning_rate =1e-4 learning_rate_warmup_steps =10 _000 weight_decay =0.0 gradient_clip_norm =2.0 ema_decay =0.9999 mixed_precision_training =bp16