Denoising Task Routing for Diffusion Models

Authors: Byeongjun Park, Sangmin Woo, Hyojun Go, Jin-Young Kim, Changick Kim

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments reveal that DTR not only consistently boosts diffusion models performance across different evaluation protocols without adding extra parameters but also accelerates training convergence. Finally, we show the complementarity between our architectural approach and existing MTL optimization techniques, providing a more complete view of MTL in the context of diffusion training. Significantly, by leveraging this complementarity, we attain matched performance of Di T-XL using the smaller Di T-L with a reduction in training iterations from 7M to 2M. Our project page is available at https://byeongjun-park.github.io/DTR/.Finally, we conduct experiments across various image generation tasks, such as unconditional, classconditional, and text-to-image generation, with FFHQ (Karras et al., 2019), Image Net (Deng et al., 2009), and MS-COCO dataset (Lin et al., 2014), respectively. By incorporating our proposed DTR into two prominent architectures, Di T (Peebles & Xie, 2022) and ADM (Dhariwal & Nichol, 2021), we observe a significant enhancement in the quality of generated images, thereby validating the benefits of our DTR.
Researcher Affiliation Collaboration KAIST Twelve Labs
Pseudocode Yes To provide further details, we illustrate pseudocode for the task routing mechanism and the routing mask instantiation in Sec. A.2.Pseudo Code 1 [Num Py-like] Random Masking (Left) vs. DTR Masking (Right)Pseudo Code 2 [Simplified] ADM block (Left) vs. ADM block + DTR (Right)Pseudo Code 3 [Simplified] Di T block (Left) vs. Di T block + DTR (Right)
Open Source Code Yes Our project page is available at https://byeongjun-park.github.io/DTR/.To further future works from our work, we release our experimental codes and checkpoints at https://github.com/byeongjun-park/DTR.
Open Datasets Yes Finally, we conduct experiments across various image generation tasks, such as unconditional, classconditional, and text-to-image generation, with FFHQ (Karras et al., 2019), Image Net (Deng et al., 2009), and MS-COCO dataset (Lin et al., 2014), respectively.
Dataset Splits Yes Text-to-Image generation: we used MS-COCO (Lin et al., 2014), which contains 82,783 training images and 40,504 validation images, each annotated with 5 descriptive captions.
Hardware Specification Yes All the models were trained on 8 NVIDIA A100 GPUs.
Software Dependencies No The paper mentions using Adam W optimizer, official code of Di T and ADM, and implementations of Min-SNR and P2, but does not provide specific version numbers for these software components or frameworks (e.g., PyTorch, TensorFlow, Python, CUDA versions).
Experiment Setup Yes We employed the Adam W optimizer (Loshchilov & Hutter, 2019) with a fixed learning rate of 1e-4. No weight decay was applied during training. A batch size of 256 was used and a horizontal flip was applied to the training data. We utilized classifier-free guidance (Ho & Salimans, 2022) with a guidance scale set to 1.5 in conditional generation settings such as text-to-image generation and class-conditional image generation.