reproducibilityindex.ai

Denoising Task Routing for Diffusion Models

Authors: Byeongjun Park, Sangmin Woo, Hyojun Go, Jin-Young Kim, Changick Kim

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments reveal that DTR not only consistently boosts diffusion models performance across different evaluation protocols without adding extra parameters but also accelerates training convergence. Finally, we show the complementarity between our architectural approach and existing MTL optimization techniques, providing a more complete view of MTL in the context of diffusion training. Significantly, by leveraging this complementarity, we attain matched performance of Di T-XL using the smaller Di T-L with a reduction in training iterations from 7M to 2M. Our project page is available at https://byeongjun-park.github.io/DTR/.Finally, we conduct experiments across various image generation tasks, such as unconditional, classconditional, and text-to-image generation, with FFHQ (Karras et al., 2019), Image Net (Deng et al., 2009), and MS-COCO dataset (Lin et al., 2014), respectively. By incorporating our proposed DTR into two prominent architectures, Di T (Peebles & Xie, 2022) and ADM (Dhariwal & Nichol, 2021), we observe a significant enhancement in the quality of generated images, thereby validating the benefits of our DTR.
Researcher Affiliation	Collaboration	KAIST Twelve Labs
Pseudocode	Yes	To provide further details, we illustrate pseudocode for the task routing mechanism and the routing mask instantiation in Sec. A.2.Pseudo Code 1 [Num Py-like] Random Masking (Left) vs. DTR Masking (Right)Pseudo Code 2 [Simplified] ADM block (Left) vs. ADM block + DTR (Right)Pseudo Code 3 [Simplified] Di T block (Left) vs. Di T block + DTR (Right)
Open Source Code	Yes	Our project page is available at https://byeongjun-park.github.io/DTR/.To further future works from our work, we release our experimental codes and checkpoints at https://github.com/byeongjun-park/DTR.
Open Datasets	Yes	Finally, we conduct experiments across various image generation tasks, such as unconditional, classconditional, and text-to-image generation, with FFHQ (Karras et al., 2019), Image Net (Deng et al., 2009), and MS-COCO dataset (Lin et al., 2014), respectively.
Dataset Splits	Yes	Text-to-Image generation: we used MS-COCO (Lin et al., 2014), which contains 82,783 training images and 40,504 validation images, each annotated with 5 descriptive captions.
Hardware Specification	Yes	All the models were trained on 8 NVIDIA A100 GPUs.
Software Dependencies	No	The paper mentions using Adam W optimizer, official code of Di T and ADM, and implementations of Min-SNR and P2, but does not provide specific version numbers for these software components or frameworks (e.g., PyTorch, TensorFlow, Python, CUDA versions).
Experiment Setup	Yes	We employed the Adam W optimizer (Loshchilov & Hutter, 2019) with a fixed learning rate of 1e-4. No weight decay was applied during training. A batch size of 256 was used and a horizontal flip was applied to the training data. We utilized classifier-free guidance (Ho & Salimans, 2022) with a guidance scale set to 1.5 in conditional generation settings such as text-to-image generation and class-conditional image generation.