Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion
Authors: Dongjun Kim, Chieh-Hsin Lai, Wei-Hsiang Liao, Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Yutong He, Yuki Mitsufuji, Stefano Ermon
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | CTM achieves new state-of-the-art FIDs for single-step diffusion model sampling on CIFAR-10 (FID 1.73) and Image Net at 64 64 resolution (FID 1.92). CTM also enables a new family of sampling schemes, both deterministic and stochastic, involving long jumps along the ODE solution trajectories. It consistently improves sample quality as computational budgets increase, avoiding the degradation seen in CM. Furthermore, unlike CM, CTM s access to the score function can streamline the adoption of established controllable/conditional generation methods from the diffusion community. This access also enables the computation of likelihood. |
| Researcher Affiliation | Collaboration | Dongjun Kim & Chieh-Hsin Lai Sony AI Tokyo, Japan dongjun@stanford.edu, chieh-hsin.lai@sony.com Wei-Hsiang Liao & Naoki Murata & Yuhta Takida & Toshimitsu Uesaka Sony AI Tokyo, Japan Yutong He Carnegie Mellon University PA, USA Yuki Mitsufuji Sony AI, Sony Group Corporation Tokyo, Japan Stefano Ermon Stanford University CA, USA |
| Pseudocode | Yes | Algorithm 2 CTM s γ-sampling, Algorithm 3 Loss-based Trajectory Optimization, Algorithm 4 CTM Training |
| Open Source Code | Yes | The code is available at https://github.com/sony/ctm. |
| Open Datasets | Yes | We evaluate CTM on CIFAR-10 and Image Net 64 64, using the pre-trained diffusion checkpoints from EDM (CIFAR-10) and CM (Image Net) as the teacher models. |
| Dataset Splits | Yes | We evaluate CTM on CIFAR-10 and Image Net 64 64. Table 2 includes "Validation Data" metrics, indicating the use of a validation set. The use of standard datasets like CIFAR-10 and ImageNet implies well-defined standard splits are used. |
| Hardware Specification | Yes | We use 4 V100 (16G) GPUs for CIFAR-10 experiments and 8 A100 (40G) GPUs for Image Net experiments. |
| Software Dependencies | No | The paper mentions "official Py Torch code of CM" and specific model implementations like "EDM’s DDPM++ implementation" but does not provide specific version numbers for software dependencies like PyTorch, CUDA, etc. |
| Experiment Setup | Yes | Table 4 "Experimental details on hyperparameters." lists specific values for Learning rate, Student's stop-grad EMA parameter µ, N, ODE solver, Max. ODE steps, EMA decay rate, Training iterations, Mixed-Precision (FP16), Batch size, and Number of GPUs. The text also details specific values like "σmin = 0.002, σmax = 80, ρ = 7, and σdata = 0.5." |