reproducibilityindex.ai

Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis

Authors: Yuxi Ren, Xin Xia, Yanzuo Lu, Jiacheng Zhang, Jie Wu, Pan Xie, XING WANG, Xuefeng Xiao

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments and user studies demonstrate that Hyper-SD achieves SOTA performance from 1 to 8 inference steps for both SDXL and SD1.5.
Researcher Affiliation	Industry	Yuxi Ren Xin Xia Yanzuo Lu Jiacheng Zhang Jie Wu Pan Xie Xing Wang Xuefeng Xiao Byte Dance Project Page: https://hyper-sd.github.io/ Project Lead. Correspondence to <xiaoxuefeng.ailab@bytedance.com>.
Pseudocode	Yes	Algorithm 1 Trajectory Segmented Consistency Distillation (TSCD)
Open Source Code	Yes	We have open-sourced Lo RAs for SDXL and SD15 from 1 to 8 steps inference, along with a dedicated one-step SDXL model, aiming to further propel the development of generative AI community.
Open Datasets	Yes	We use a subset of the LAION [30] and COYO [6] datasets following SDXL-lightning [14] during the training procedure of Sec 3.1 and Sec 3.3. For the Human Feedback Learning in Sec 3.2, we utilized the COCO2017 train split dataset with instance annotations and captions for structure optimization.
Dataset Splits	No	The paper mentions using 'COCO2017 train split dataset' for a specific part of training (Human Feedback Learning) and 'COCO-5k' for evaluation, but it does not specify explicit training/validation/test splits for the overall model training.
Hardware Specification	Yes	Our training per stage costs around 200 A100 GPU hours.
Software Dependencies	No	The paper mentions software like Python, PyTorch, and specific models (SOLO, LAION aesthetic predictor, Image Reward) but does not provide specific version numbers for any of these components.
Experiment Setup	Yes	For TSCD in Sec 3.1, we progressively reduced the time-steps segments number as 8 4 2 1 in four stages, employing 512 batch size and learning rate 1e 6... For one-step enhancement in Sec 3.3, we trained the unified all-timesteps consistency Lo RA with time-step inputs T = 999 and the dedicated model for single-step generation with T = 800.