Diffusion Models for Multi-Task Generative Modeling
Authors: Changyou Chen, Han Ding, Bunyamin Sisman, Yi Xu, Ouye Xie, Benjamin Z. Yao, Son Dinh Tran, Belinda Zeng
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results on Image Net indicate the effectiveness of our framework for various multi-modal generative modeling, which we believe is an important research direction worthy of more future explorations. |
| Researcher Affiliation | Collaboration | Changyou Chen1,2 Han Ding2 Bunyamin Sisman2 Yi Xu2 Ouye Xie2 Benjamin Yao2 Son Tran2 Belinda Zeng2 1University at Buffalo 2Amazon |
| Pseudocode | Yes | Algorithm 1 MT-Diffusion Inference [in Appendix C]; Algorithm 2 MT-Diffusion Training [in Appendix E] |
| Open Source Code | No | The paper mentions using existing codebases like 'guided diffusion codebase (dif)' and 'latent diffusion codebase from ldm' but does not provide a link or statement about releasing its own specific implementation code for the proposed MT-Diffusion. |
| Open Datasets | Yes | we mainly rely on the Image Net-1K dataset (Deng et al., 2009) with resolutions of 64 × 64 and 128 × 128 |
| Dataset Splits | Yes | we mainly rely on the Image Net-1K dataset (Deng et al., 2009) with resolutions of 64 × 64 and 128 × 128, where we adopt the pre-defined training and validation splits. |
| Hardware Specification | Yes | All experiments are conducted on a A100 GPU server consists of 8 GPUs, with a batchsize of 64, if not explicitly specified. |
| Software Dependencies | No | The paper mentions using 'guided diffusion codebase (dif)' and 'latent diffusion codebase from ldm', but it does not provide specific version numbers for these software components or other libraries like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | We adopt the default hyper-parameters for training the models...Attention resolutions: (32, 16, 9) Diffusion steps: 1000 Learn sigma: False Noise schedule: Linear #channels: 320 #heads: 8 #res blocks: 2 Resblock updown: False Use scale shift norm: False Learning rate: 1.0e-4 Batch size: 32 |