Discovering Failure Modes of Text-guided Diffusion Models via Adversarial Search
Authors: Qihao Liu, Adam Kortylewski, Yutong Bai, Song Bai, Alan Yuille
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that SAGE can effectively identify a variety of failure cases of different TDMs and GAN-based models. Then, with SAGE, we give a comprehensive analysis of SOTA generative models, and investigate in detail four typical failure modes across all TDMs and GAN-based models, including GLIDE (Nichol et al., 2021), Stable Diffusion V1.5/V2.1 (Rombach et al., 2022), Deep Floyd (Konstantinov et al., 2023), and Style GAN-XL (Sauer et al., 2022): |
| Researcher Affiliation | Collaboration | Qihao Liu1 Adam Kortylewski2,3 Yutong Bai1 Song Bai4 Alan Yuille1 1Johns Hopkins University 2University of Freiburg 3Max-Planck-Institute for Informatics 4Byte Dance Inc. |
| Pseudocode | Yes | A.2 PSEUDOCODE We provide the pseudocode of the gradient-guided search for text prompts in Algo. 1. Algorithm 1 : Gradient-guided search for text prompts |
| Open Source Code | Yes | Project page: https://sage-diffusion.github.io |
| Open Datasets | Yes | Object categories. In Sec. 4.1 of the main paper, to demonstrate the effectiveness of SAGE, we select 20 common object categories as the key objects. They include cat , dog , bird , fish , horse , car , plane , train , ship , laptop , chair , bike , television , bear , monkey , sheep , cow , cock , snake , and butterfly . For each category, we consider all relevant subcategories in Image Net-1k as the correct category when building the optimization targets. |
| Dataset Splits | No | The paper describes experimental setup and evaluation but does not provide explicit training, validation, or test dataset splits for the main experiments, or detailed methodology for how these splits were defined if used. |
| Hardware Specification | Yes | Experiments involving LLa MA and experiments of baselines were conducted on a single A100 GPU, while other experiments were on a single V100 GPU. |
| Software Dependencies | No | The paper mentions specific LLM models like 'LLa MA 7B (Touvron et al., 2023)' and libraries like 'Timm Library (Wightman, 2019)' but does not provide explicit version numbers for general software dependencies such as PyTorch, TensorFlow, or CUDA, which are typically required for reproducibility. |
| Experiment Setup | Yes | For the search over the latent space, we search for at most 500 iterations with α = 5 10 2 and constrain the perturbation dz to be in the range dz [ 1, 1].We add residual connection to denoising step t = 20 with weight ω = 0.9. For the search over the token embedding space, we search for at most 250 iterations with α = 1 10 3 and constrain the embedding τ to be in the range τ [ 2.5, 2.5]. We use the gradient from denoising step t = 5. For the search over the prompt space, we search for at most 100 iterations for each word, with α = 5 10 2, λ = 0.1 and constrain the embedding τ to be in the range τ [ 2.5, 2.5]. We use the gradient from denoising step t = 5. For each input prompt, LLa MA gives at least k = 100 candidates and the maximal length of the generated text is m = 10. |