Diffusion Models for Multi-Task Generative Modeling

Authors: Changyou Chen, Han Ding, Bunyamin Sisman, Yi Xu, Ouye Xie, Benjamin Z. Yao, Son Dinh Tran, Belinda Zeng

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results on Image Net indicate the effectiveness of our framework for various multi-modal generative modeling, which we believe is an important research direction worthy of more future explorations.
Researcher Affiliation Collaboration Changyou Chen1,2 Han Ding2 Bunyamin Sisman2 Yi Xu2 Ouye Xie2 Benjamin Yao2 Son Tran2 Belinda Zeng2 1University at Buffalo 2Amazon
Pseudocode Yes Algorithm 1 MT-Diffusion Inference [in Appendix C]; Algorithm 2 MT-Diffusion Training [in Appendix E]
Open Source Code No The paper mentions using existing codebases like 'guided diffusion codebase (dif)' and 'latent diffusion codebase from ldm' but does not provide a link or statement about releasing its own specific implementation code for the proposed MT-Diffusion.
Open Datasets Yes we mainly rely on the Image Net-1K dataset (Deng et al., 2009) with resolutions of 64 × 64 and 128 × 128
Dataset Splits Yes we mainly rely on the Image Net-1K dataset (Deng et al., 2009) with resolutions of 64 × 64 and 128 × 128, where we adopt the pre-defined training and validation splits.
Hardware Specification Yes All experiments are conducted on a A100 GPU server consists of 8 GPUs, with a batchsize of 64, if not explicitly specified.
Software Dependencies No The paper mentions using 'guided diffusion codebase (dif)' and 'latent diffusion codebase from ldm', but it does not provide specific version numbers for these software components or other libraries like Python, PyTorch, or CUDA.
Experiment Setup Yes We adopt the default hyper-parameters for training the models...Attention resolutions: (32, 16, 9) Diffusion steps: 1000 Learn sigma: False Noise schedule: Linear #channels: 320 #heads: 8 #res blocks: 2 Resblock updown: False Use scale shift norm: False Learning rate: 1.0e-4 Batch size: 32