Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis

Authors: Yuxi Ren, Xin Xia, Yanzuo Lu, Jiacheng Zhang, Jie Wu, Pan Xie, XING WANG, Xuefeng Xiao

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments and user studies demonstrate that Hyper-SD achieves SOTA performance from 1 to 8 inference steps for both SDXL and SD1.5.
Researcher Affiliation Industry Yuxi Ren Xin Xia Yanzuo Lu Jiacheng Zhang Jie Wu Pan Xie Xing Wang Xuefeng Xiao Byte Dance Project Page: https://hyper-sd.github.io/ Project Lead. Correspondence to <xiaoxuefeng.ailab@bytedance.com>.
Pseudocode Yes Algorithm 1 Trajectory Segmented Consistency Distillation (TSCD)
Open Source Code Yes We have open-sourced Lo RAs for SDXL and SD15 from 1 to 8 steps inference, along with a dedicated one-step SDXL model, aiming to further propel the development of generative AI community.
Open Datasets Yes We use a subset of the LAION [30] and COYO [6] datasets following SDXL-lightning [14] during the training procedure of Sec 3.1 and Sec 3.3. For the Human Feedback Learning in Sec 3.2, we utilized the COCO2017 train split dataset with instance annotations and captions for structure optimization.
Dataset Splits No The paper mentions using 'COCO2017 train split dataset' for a specific part of training (Human Feedback Learning) and 'COCO-5k' for evaluation, but it does not specify explicit training/validation/test splits for the overall model training.
Hardware Specification Yes Our training per stage costs around 200 A100 GPU hours.
Software Dependencies No The paper mentions software like Python, PyTorch, and specific models (SOLO, LAION aesthetic predictor, Image Reward) but does not provide specific version numbers for any of these components.
Experiment Setup Yes For TSCD in Sec 3.1, we progressively reduced the time-steps segments number as 8 4 2 1 in four stages, employing 512 batch size and learning rate 1e 6... For one-step enhancement in Sec 3.3, we trained the unified all-timesteps consistency Lo RA with time-step inputs T = 999 and the dedicated model for single-step generation with T = 800.