Rethinking Peculiar Images by Diffusion Models: Revealing Local Minima’s Role

Authors: Jinhyeok Jang, Chan-Hyun Youn, Minsu Jeon, Changha Lee

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show momentum effectively prevents peculiar image generation without extra computation. We hypothesize and empirically demonstrate that peculiar image generation is akin to the local minima problem in optimization. Our experiments demonstrate the effectiveness of both momentum strategies in alleviating the production of peculiar artifacts and efficiently generating reasonable images.
Researcher Affiliation Academia KAIST {jjh6297, chyoun, msjeon, changha.lee}@kaist.ac.kr
Pseudocode Yes Algorithm 1: Formulation of GEM for a variable u; Algorithm 2: Momentum for diffusion sampling
Open Source Code Yes Our source code is available1. (footnote 1 links to https://github.com/jjh6297/momentum-diffusion-sampling)
Open Datasets Yes We conducted an analysis using pre-trained diffusion models2 on the CIFAR10 and Celeb A datasets (Liu et al. 2015), without any further training. CIFAR10(Krizhevsky, Hinton et al. 2009). MS COCO dataset (Lin et al. 2014).
Dataset Splits No The paper does not provide specific details on training, validation, or test dataset splits. It states that pre-trained models were used for analysis, and evaluates generated images rather than model training performance with specific splits.
Hardware Specification No The paper does not provide any specific details about the hardware used for running the experiments (e.g., GPU model, CPU type, memory).
Software Dependencies No The paper mentions using
Experiment Setup Yes The paper details specific experimental parameters such as the momentum parameter β and the number of steps T (NFE) for evaluation: "the results unveil that for small step sizes (T ≤ 25), a negative β worsens FID. However, negative β effectively improves FID for all cases where T > 25. Moreover, a distinct pattern emerges: the optimal β is proportional to 1/T." and "we applied the momentum (β = 0.3) to a pretrained stable diffusion model with DDIM sampling and T = 100."