Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts

Authors: Zhi-Yi Chin, Chieh Ming Jiang, Ching-Chun Huang, Pin-Yu Chen, Wei-Chen Chiu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive experiments based on the Inappropriate Image Prompts (I2P) dataset reveal the fact that around half of the prompts which originally can be tackled by the existing safety mechanisms are actually manipulable by our P4D to become problematic ones. Quantitative results and some qualitative examples are reported in Table 2 and Figure 3 respectively. Ablation Studies and Extended Discussion
Researcher Affiliation Collaboration 1Department of Computer Science, National Yang Ming Chiao Tung University, Hsinchu, Taiwan 2IBM Research, NY 10598, USA.
Pseudocode No The paper describes the procedure for P4D in numbered steps but does not present it as a formal pseudocode block or algorithm labeled as such.
Open Source Code Yes Our codes are publicly available at https://github.com/joycenerd/P4D
Open Datasets Yes For concept-related dataset, we focus on Inappropriate Image Prompts (I2P) dataset (Schramowski et al., 2023)... For the object-related datasets, we utilize the car and French-horn classes from ESD (Gandikota et al., 2023)... pre-trained Res Net-18 classifier (Ma, 2021) from the Imagenette dataset (Howard, 2019)... YOLO v5 vehicle detector (Boneh, 2023)... COCO (Lin et al., 2014)...
Dataset Splits No The paper describes how prompts were selected for the dataset ('filtered dataset') and how evaluation was performed on generated images, but it does not specify explicit training/validation/test dataset splits (e.g., percentages or sample counts) for its own P4D method.
Hardware Specification Yes Our P4D debugging process leverages two A5000 GPUs, each equipped with 24 GB of CUDA memory, to facilitate the execution of our red-teaming methodology, which integrates two distinct T2I models (unconstrained and safe).
Software Dependencies No The paper mentions 'Adam W (Loshchilov & Hutter, 2018) as the optimizer', 'Stable Diffusion v1-4 model backbone', 'Stable Diffusion v2-0 model backbone', 'Mini LM (Wang et al., 2020)', and 'CLIP (Radford et al., 2021)'. However, it does not provide specific version numbers for these software dependencies (e.g., Python version, PyTorch version, or specific library versions).
Experiment Setup Yes We set N = 16 and K = 3 respectively for our P4D-N and P4D-K. We set the batch size to 1, learning rate to 0.1, weight decay to 0.1, and use Adam W (Loshchilov & Hutter, 2018) as the optimizer. All the prompts P cont are optimized with 3000 gradient update steps. the number of inference steps is set to 25 and the setting of random seed aligns with the used dataset, where guidance scale is set to 7.5 if not specified in the dataset.