Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion

Authors: Dongjun Kim, Chieh-Hsin Lai, Wei-Hsiang Liao, Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Yutong He, Yuki Mitsufuji, Stefano Ermon

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental CTM achieves new state-of-the-art FIDs for single-step diffusion model sampling on CIFAR-10 (FID 1.73) and Image Net at 64 64 resolution (FID 1.92). CTM also enables a new family of sampling schemes, both deterministic and stochastic, involving long jumps along the ODE solution trajectories. It consistently improves sample quality as computational budgets increase, avoiding the degradation seen in CM. Furthermore, unlike CM, CTM s access to the score function can streamline the adoption of established controllable/conditional generation methods from the diffusion community. This access also enables the computation of likelihood.
Researcher Affiliation Collaboration Dongjun Kim & Chieh-Hsin Lai Sony AI Tokyo, Japan dongjun@stanford.edu, chieh-hsin.lai@sony.com Wei-Hsiang Liao & Naoki Murata & Yuhta Takida & Toshimitsu Uesaka Sony AI Tokyo, Japan Yutong He Carnegie Mellon University PA, USA Yuki Mitsufuji Sony AI, Sony Group Corporation Tokyo, Japan Stefano Ermon Stanford University CA, USA
Pseudocode Yes Algorithm 2 CTM s γ-sampling, Algorithm 3 Loss-based Trajectory Optimization, Algorithm 4 CTM Training
Open Source Code Yes The code is available at https://github.com/sony/ctm.
Open Datasets Yes We evaluate CTM on CIFAR-10 and Image Net 64 64, using the pre-trained diffusion checkpoints from EDM (CIFAR-10) and CM (Image Net) as the teacher models.
Dataset Splits Yes We evaluate CTM on CIFAR-10 and Image Net 64 64. Table 2 includes "Validation Data" metrics, indicating the use of a validation set. The use of standard datasets like CIFAR-10 and ImageNet implies well-defined standard splits are used.
Hardware Specification Yes We use 4 V100 (16G) GPUs for CIFAR-10 experiments and 8 A100 (40G) GPUs for Image Net experiments.
Software Dependencies No The paper mentions "official Py Torch code of CM" and specific model implementations like "EDM’s DDPM++ implementation" but does not provide specific version numbers for software dependencies like PyTorch, CUDA, etc.
Experiment Setup Yes Table 4 "Experimental details on hyperparameters." lists specific values for Learning rate, Student's stop-grad EMA parameter µ, N, ODE solver, Max. ODE steps, EMA decay rate, Training iterations, Mixed-Precision (FP16), Batch size, and Number of GPUs. The text also details specific values like "σmin = 0.002, σmax = 80, ρ = 7, and σdata = 0.5."