Diffusion Sampling with Momentum for Mitigating Divergence Artifacts

Authors: Suttisak Wizadwongsa, Worameth Chinchuthakun, Pramook Khungurn, Amit Raj, Supasorn Suwajanakorn

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that our techniques are highly effective in reducing artifacts and improving image quality, surpassing state-of-the-art diffusion solvers on both pixel-based and latent-based diffusion models for low-step sampling.
Researcher Affiliation Collaboration Suttisak Wizadwongsa VISTEC, Thailand suttisak.w s19@vistec.ac.th Worameth Chinchuthakun Tokyo Institute of Technology, Japan chinchuthakun.w.aa@m.titech.ac.jp Pramook Khungurn pixiv Inc. pramook@gmail.com Amit Raj Google amitrajs@google.com Supasorn Suwajanakorn VISTEC, Thailand supasorn.s@vistec.ac.th
Pseudocode Yes Algorithm 1: PLMS step with HB momentum; Algorithm 2: GHVB step
Open Source Code No The paper mentions obtaining and using code implementations from various official GitHub repositories for its experiments (e.g., DPM-Solver, Di T, MDT), but it does not state that its *own* developed methodology (HB and GHVB techniques) has been open-sourced with a direct link or explicit release statement.
Open Datasets Yes We use Stable Diffusion (Rombach et al., 2022), COCO dataset (Lin et al., 2014), Image Net256 (Russakovsky et al., 2015), ADM from Peebles & Xie (2022), Di T-XL (Peebles & Xie, 2022) and MDT (Gao et al., 2023), face dataset accessible at Kaggle12.
Dataset Splits No The paper mentions using specific datasets for experiments and evaluating performance, but it does not explicitly provide details about training, validation, or test splits, such as percentages, sample counts, or specific predefined split methodologies. It refers to evaluating generated samples against reference solutions, but not to standard data partitioning for reproducibility.
Hardware Specification Yes The experiment was done on four NVIDIA RTX A4000 GPUs and a 24-core AMD Threadripper 3960x CPU. The experiment was done on four NVIDIA Ge Force RTX 2080 Ti GPUs and a 24-core AMD Threadripper 3960x CPU. All experiments were conducted on four NVIDIA RTX A4000 GPUs. Table 12: Comparison of the average sampling time per image (in seconds) when using different numbers of steps in Stable Diffusion 1.5 on a single NVIDIA Ge Force RTX 3080.
Software Dependencies No The paper refers to various diffusion models and solvers by name with citations (e.g., DPM-Solver++, PLMS4, Uni PC), and mentions obtaining code from specific GitHub repositories. However, it does not provide a list of specific software dependencies with their version numbers (e.g., Python 3.x, PyTorch x.y, CUDA z.w) necessary for reproducing the experimental environment.
Experiment Setup Yes In this experiment, we apply our HB and GHVB techniques to the most popular 2nd and 4th-order solvers, DPM-Solver++ (Lu et al., 2022a) and PLMS4 (Liu et al., 2022a), using 15 sampling steps and various guidance scales on three different text-to-image diffusion models. The results using τ = 3 (magnitude considered high when above 3 std.) are shown in Figures 7 and 8. We generate 160 samples from the same set of text prompts and seeds for each method from a fine-tuned Stable Diffusion model called Anything V4. The comparison is done on Stable Diffusion 1.5 with the target results obtained from a 1,000-step DDIM method. Specifically, we prepared 150 sets of inputs, i.e., text prompt, reference image, random seed, and a fixed guidance scale of 7.5 and style fidelity of 1.0 (see 16), and generated the target results by using a 1,000-step PLMS4 (Liu et al., 2022a).