Gaussian Mixture Solvers for Diffusion Models

Authors: Hanzhong Guo, Cheng Lu, Fan Bao, Tianyu Pang, Shuicheng Yan, Chao Du, Chongxuan LI

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, our solver outperforms numerous SDE-based solvers in terms of sample quality in image generation and stroke-based synthesis in various diffusion models, which validates the motivation and effectiveness of GMS.
Researcher Affiliation Collaboration Hanzhong Guo 1,3, Cheng Lu4, Fan Bao4, Tianyu Pang2, Shuicheng Yan2, Chao Du 2, Chongxuan Li 1,3 1Gaoling School of Artificial Intelligence, Renmin University of China 2Sea AI Lab, Singapore 3Beijing Key Laboratory of Big Data Management and Analysis Methods, Beijing, China 4Tsinghua University {allanguo, tianyupang, yansc, duchao}@sea.com; lucheng.lc15@gmail.com; bf19@mails.tsinghua.edu.cn; chongxuanli@ruc.edu.cn
Pseudocode Yes Algorithm 1 Learning of the high order noise network; Algorithm 2 Sampling via GMS with the first three order moments
Open Source Code Yes Our code is available at https://github.com/Guohanzhong/GMS.
Open Datasets Yes Our results show that GMS outperforms state-of-the-art SDE-based solvers [14, 2, 17] in terms of sample quality with the limited number of discretization steps (e.g., < 100). For instance, GMS improves the FID by 4.44 over the SOTA SDE-based solver [2] given 10 steps on CIFAR10. Furthermore, We evaluate GMS on a stroke-based synthesis task. The findings consistently reveal that GMS achieves higher levels of realism than all aforementioned SDE-based solvers as well as the widely adopted ODE-based solver DDIM [36] while maintaining comparable computation budgets and faithfulness scores (measured by L2 distance).
Dataset Splits No The paper mentions selecting models with the best FID on generated samples but does not specify a distinct validation split of the input datasets (CIFAR10, ImageNet) or its proportions.
Hardware Specification Yes Training one noise network on CIFAR10 takes about 100 hours on one A100. Training on Image Net 64x64 takes about 150 hours on one A100.
Software Dependencies No The paper does not provide specific version numbers for software dependencies beyond general mentions of libraries or frameworks.
Experiment Setup Yes On all datasets, we use the ADAN optimizer [41] with a learning rate of 10 4; we train 2M iterations in total for a higher order of noise network; we use an exponential moving average (EMA) with a rate of 0.9999. We use a batch size of 64 on Image Net 64X64 and 128 on CIFAR10.