reproducibilityindex.ai

Consistency Diffusion Bridge Models

Authors: Guande He, Kaiwen Zheng, Jianfei Chen, Fan Bao, Jun Zhu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that our proposed method could sample 4 to 50 faster than the base DDBM and produce better visual quality given the same step in various tasks with pixel resolution ranging from 64 64 to 256 256, as well as supporting downstream tasks such as semantic interpolation in the data space.
Researcher Affiliation	Collaboration	1Dept. of Comp. Sci. & Tech., Institute for AI, BNRist Center, THBI Lab 1Tsinghua-Bosch Joint ML Center, Tsinghua University, Beijing, China 2Shengshu Technology, Beijing 3Pazhou Lab (Huangpu), Guangzhou, China
Pseudocode	No	The paper does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The release of the code needs an official procedure related to the authors affiliation, which is not approved yet.
Open Datasets	Yes	For image-to-image translation, we use the Edges Handbags [23] with 64 64 pixel resolution and DIODE-Outdoor [62] with 256 256 pixel resolution. For image inpainting, we choose Image Net [9] 256 256 with a center mask of size 128 128.
Dataset Splits	Yes	The metrics are computed using the complete training set for Edges Handbags and DIODE-Outdoor, and a validation subset of 10,000 images for Image Net.
Hardware Specification	Yes	We train the model with 8 NVIDIA A800 80G GPUs for 9.5 days...
Software Dependencies	No	The paper mentions "mixed precision (fp16)" and the "RAdam [27, 34] optimizer" but does not specify version numbers for any software libraries or dependencies (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	For training CDBMs, we use a global batch size of 128 and a learning rate of 1e-5 with mixed precision (fp16) for all datasets using 8 NVIDIA A800 80G GPUs. For the constant training schedule r(t) = t t, we train the model for 50k steps, while for the sigmoid-style training schedule, we train the model for 6s steps, e.g., 30k or 60k steps, due to numerical instability when t r(t) is small. For CBD, training a model for 50k steps on a dataset with 256 256 resolution takes 2.5 days, while CBT takes 1.5 days. In this work, we normalize all images within [ 1, 1] and adopt the RAdam [27, 34] optimizer.