reproducibilityindex.ai

Discovering Failure Modes of Text-guided Diffusion Models via Adversarial Search

Authors: Qihao Liu, Adam Kortylewski, Yutong Bai, Song Bai, Alan Yuille

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that SAGE can effectively identify a variety of failure cases of different TDMs and GAN-based models. Then, with SAGE, we give a comprehensive analysis of SOTA generative models, and investigate in detail four typical failure modes across all TDMs and GAN-based models, including GLIDE (Nichol et al., 2021), Stable Diffusion V1.5/V2.1 (Rombach et al., 2022), Deep Floyd (Konstantinov et al., 2023), and Style GAN-XL (Sauer et al., 2022):
Researcher Affiliation	Collaboration	Qihao Liu1 Adam Kortylewski2,3 Yutong Bai1 Song Bai4 Alan Yuille1 1Johns Hopkins University 2University of Freiburg 3Max-Planck-Institute for Informatics 4Byte Dance Inc.
Pseudocode	Yes	A.2 PSEUDOCODE We provide the pseudocode of the gradient-guided search for text prompts in Algo. 1. Algorithm 1 : Gradient-guided search for text prompts
Open Source Code	Yes	Project page: https://sage-diffusion.github.io
Open Datasets	Yes	Object categories. In Sec. 4.1 of the main paper, to demonstrate the effectiveness of SAGE, we select 20 common object categories as the key objects. They include cat , dog , bird , fish , horse , car , plane , train , ship , laptop , chair , bike , television , bear , monkey , sheep , cow , cock , snake , and butterfly . For each category, we consider all relevant subcategories in Image Net-1k as the correct category when building the optimization targets.
Dataset Splits	No	The paper describes experimental setup and evaluation but does not provide explicit training, validation, or test dataset splits for the main experiments, or detailed methodology for how these splits were defined if used.
Hardware Specification	Yes	Experiments involving LLa MA and experiments of baselines were conducted on a single A100 GPU, while other experiments were on a single V100 GPU.
Software Dependencies	No	The paper mentions specific LLM models like 'LLa MA 7B (Touvron et al., 2023)' and libraries like 'Timm Library (Wightman, 2019)' but does not provide explicit version numbers for general software dependencies such as PyTorch, TensorFlow, or CUDA, which are typically required for reproducibility.
Experiment Setup	Yes	For the search over the latent space, we search for at most 500 iterations with α = 5 10 2 and constrain the perturbation dz to be in the range dz [ 1, 1].We add residual connection to denoising step t = 20 with weight ω = 0.9. For the search over the token embedding space, we search for at most 250 iterations with α = 1 10 3 and constrain the embedding τ to be in the range τ [ 2.5, 2.5]. We use the gradient from denoising step t = 5. For the search over the prompt space, we search for at most 100 iterations for each word, with α = 5 10 2, λ = 0.1 and constrain the embedding τ to be in the range τ [ 2.5, 2.5]. We use the gradient from denoising step t = 5. For each input prompt, LLa MA gives at least k = 100 candidates and the maximal length of the generated text is m = 10.