reproducibilityindex.ai

Diffusion Sampling with Momentum for Mitigating Divergence Artifacts

Authors: Suttisak Wizadwongsa, Worameth Chinchuthakun, Pramook Khungurn, Amit Raj, Supasorn Suwajanakorn

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that our techniques are highly effective in reducing artifacts and improving image quality, surpassing state-of-the-art diffusion solvers on both pixel-based and latent-based diffusion models for low-step sampling.
Researcher Affiliation	Collaboration	Suttisak Wizadwongsa VISTEC, Thailand suttisak.w s19@vistec.ac.th Worameth Chinchuthakun Tokyo Institute of Technology, Japan chinchuthakun.w.aa@m.titech.ac.jp Pramook Khungurn pixiv Inc. pramook@gmail.com Amit Raj Google amitrajs@google.com Supasorn Suwajanakorn VISTEC, Thailand supasorn.s@vistec.ac.th
Pseudocode	Yes	Algorithm 1: PLMS step with HB momentum; Algorithm 2: GHVB step
Open Source Code	No	The paper mentions obtaining and using code implementations from various official GitHub repositories for its experiments (e.g., DPM-Solver, Di T, MDT), but it does not state that its own developed methodology (HB and GHVB techniques) has been open-sourced with a direct link or explicit release statement.
Open Datasets	Yes	We use Stable Diffusion (Rombach et al., 2022), COCO dataset (Lin et al., 2014), Image Net256 (Russakovsky et al., 2015), ADM from Peebles & Xie (2022), Di T-XL (Peebles & Xie, 2022) and MDT (Gao et al., 2023), face dataset accessible at Kaggle12.
Dataset Splits	No	The paper mentions using specific datasets for experiments and evaluating performance, but it does not explicitly provide details about training, validation, or test splits, such as percentages, sample counts, or specific predefined split methodologies. It refers to evaluating generated samples against reference solutions, but not to standard data partitioning for reproducibility.
Hardware Specification	Yes	The experiment was done on four NVIDIA RTX A4000 GPUs and a 24-core AMD Threadripper 3960x CPU. The experiment was done on four NVIDIA Ge Force RTX 2080 Ti GPUs and a 24-core AMD Threadripper 3960x CPU. All experiments were conducted on four NVIDIA RTX A4000 GPUs. Table 12: Comparison of the average sampling time per image (in seconds) when using different numbers of steps in Stable Diffusion 1.5 on a single NVIDIA Ge Force RTX 3080.
Software Dependencies	No	The paper refers to various diffusion models and solvers by name with citations (e.g., DPM-Solver++, PLMS4, Uni PC), and mentions obtaining code from specific GitHub repositories. However, it does not provide a list of specific software dependencies with their version numbers (e.g., Python 3.x, PyTorch x.y, CUDA z.w) necessary for reproducing the experimental environment.
Experiment Setup	Yes	In this experiment, we apply our HB and GHVB techniques to the most popular 2nd and 4th-order solvers, DPM-Solver++ (Lu et al., 2022a) and PLMS4 (Liu et al., 2022a), using 15 sampling steps and various guidance scales on three different text-to-image diffusion models. The results using τ = 3 (magnitude considered high when above 3 std.) are shown in Figures 7 and 8. We generate 160 samples from the same set of text prompts and seeds for each method from a fine-tuned Stable Diffusion model called Anything V4. The comparison is done on Stable Diffusion 1.5 with the target results obtained from a 1,000-step DDIM method. Specifically, we prepared 150 sets of inputs, i.e., text prompt, reference image, random seed, and a fixed guidance scale of 7.5 and style fidelity of 1.0 (see 16), and generated the target results by using a 1,000-step PLMS4 (Liu et al., 2022a).