Consistency Diffusion Bridge Models

Authors: Guande He, Kaiwen Zheng, Jianfei Chen, Fan Bao, Jun Zhu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that our proposed method could sample 4 to 50 faster than the base DDBM and produce better visual quality given the same step in various tasks with pixel resolution ranging from 64 64 to 256 256, as well as supporting downstream tasks such as semantic interpolation in the data space.
Researcher Affiliation Collaboration 1Dept. of Comp. Sci. & Tech., Institute for AI, BNRist Center, THBI Lab 1Tsinghua-Bosch Joint ML Center, Tsinghua University, Beijing, China 2Shengshu Technology, Beijing 3Pazhou Lab (Huangpu), Guangzhou, China
Pseudocode No The paper does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The release of the code needs an official procedure related to the authors affiliation, which is not approved yet.
Open Datasets Yes For image-to-image translation, we use the Edges Handbags [23] with 64 64 pixel resolution and DIODE-Outdoor [62] with 256 256 pixel resolution. For image inpainting, we choose Image Net [9] 256 256 with a center mask of size 128 128.
Dataset Splits Yes The metrics are computed using the complete training set for Edges Handbags and DIODE-Outdoor, and a validation subset of 10,000 images for Image Net.
Hardware Specification Yes We train the model with 8 NVIDIA A800 80G GPUs for 9.5 days...
Software Dependencies No The paper mentions "mixed precision (fp16)" and the "RAdam [27, 34] optimizer" but does not specify version numbers for any software libraries or dependencies (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes For training CDBMs, we use a global batch size of 128 and a learning rate of 1e-5 with mixed precision (fp16) for all datasets using 8 NVIDIA A800 80G GPUs. For the constant training schedule r(t) = t t, we train the model for 50k steps, while for the sigmoid-style training schedule, we train the model for 6s steps, e.g., 30k or 60k steps, due to numerical instability when t r(t) is small. For CBD, training a model for 50k steps on a dataset with 256 256 resolution takes 2.5 days, while CBT takes 1.5 days. In this work, we normalize all images within [ 1, 1] and adopt the RAdam [27, 34] optimizer.