Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis
Authors: Yuxi Ren, Xin Xia, Yanzuo Lu, Jiacheng Zhang, Jie Wu, Pan Xie, XING WANG, Xuefeng Xiao
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments and user studies demonstrate that Hyper-SD achieves SOTA performance from 1 to 8 inference steps for both SDXL and SD1.5. |
| Researcher Affiliation | Industry | Yuxi Ren Xin Xia Yanzuo Lu Jiacheng Zhang Jie Wu Pan Xie Xing Wang Xuefeng Xiao Byte Dance Project Page: https://hyper-sd.github.io/ Project Lead. Correspondence to <xiaoxuefeng.ailab@bytedance.com>. |
| Pseudocode | Yes | Algorithm 1 Trajectory Segmented Consistency Distillation (TSCD) |
| Open Source Code | Yes | We have open-sourced Lo RAs for SDXL and SD15 from 1 to 8 steps inference, along with a dedicated one-step SDXL model, aiming to further propel the development of generative AI community. |
| Open Datasets | Yes | We use a subset of the LAION [30] and COYO [6] datasets following SDXL-lightning [14] during the training procedure of Sec 3.1 and Sec 3.3. For the Human Feedback Learning in Sec 3.2, we utilized the COCO2017 train split dataset with instance annotations and captions for structure optimization. |
| Dataset Splits | No | The paper mentions using 'COCO2017 train split dataset' for a specific part of training (Human Feedback Learning) and 'COCO-5k' for evaluation, but it does not specify explicit training/validation/test splits for the overall model training. |
| Hardware Specification | Yes | Our training per stage costs around 200 A100 GPU hours. |
| Software Dependencies | No | The paper mentions software like Python, PyTorch, and specific models (SOLO, LAION aesthetic predictor, Image Reward) but does not provide specific version numbers for any of these components. |
| Experiment Setup | Yes | For TSCD in Sec 3.1, we progressively reduced the time-steps segments number as 8 4 2 1 in four stages, employing 512 batch size and learning rate 1e 6... For one-step enhancement in Sec 3.3, we trained the unified all-timesteps consistency Lo RA with time-step inputs T = 999 and the dedicated model for single-step generation with T = 800. |