reproducibilityindex.ai

Improved Techniques for Training Consistency Models

Authors: Yang Song, Prafulla Dhariwal

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To tackle these challenges, we present improved techniques for consistency training, where consistency models learn directly from data without distillation. Combined with better hyperparameter tuning, these modifications enable consistency models to achieve FID scores of 2.51 and 3.25 on CIFAR-10 and Image Net 64 ˆ 64 respectively in a single sampling step. These scores mark a 3.5ˆ and 4ˆ improvement compared to prior consistency training approaches. Through two-step sampling, we further reduce FID scores to 2.24 and 2.77 on these two datasets, surpassing those obtained via distillation in both one-step and two-step settings, while narrowing the gap between consistency models and other state-of-the-art generative models.
Researcher Affiliation	Industry	Yang Song & Prafulla Dhariwal Open AI {songyang,prafulla}@openai.com
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any concrete access information (e.g., repository link, explicit statement of code release) for open-source code related to the methodology.
Open Datasets	Yes	All models are trained on the CIFAR-10 dataset (Krizhevsky et al., 2014) without class labels. We observe similar improvements on other datasets, including Image Net 64ˆ64 (Deng et al., 2009).
Dataset Splits	No	The paper mentions training on CIFAR-10 and ImageNet 64x64, but does not explicitly provide details about training/validation/test splits, such as percentages or specific sample counts for a validation set.
Hardware Specification	Yes	All models are trained on a cluster of Nvidia A100 GPUs.
Software Dependencies	No	The paper mentions using the RAdam optimizer and specific architectures (NCSN++, ADM) but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	We train all models with the RAdam optimizer (Liu et al., 2019) using learning rate 0.0001. All CIFAR-10 models are trained for 400,000 iterations, whereas Image Net 64 ˆ 64 models are trained for 800,000 iterations. For CIFAR-10 models in Section 3, we use batch size 512 and EMA decay rate 0.9999 for the student network. For i CT and i CT-deep models in Table 2, we use batch size 1024 and EMA decay rate of 0.99993 for CIFAR-10 models, and batch size 4096 and EMA decay rate 0.99997 for Image Net 64 ˆ 64 models. We use a dropout rate of 0.3 for all consistency models on CIFAR-10. For Image Net 64 ˆ 64, we use a dropout rate of 0.2.