ColJailBreak: Collaborative Generation and Editing for Jailbreaking Text-to-Image Deep Generation
Authors: Yizhuo Ma, Shanmin Pang, Qi Guo, Tianyu Wei, Qing Guo
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our method on three datasets and compare it to two baseline methods. Our method could generate unsafe content through two commercial deep generation models including GPT-4 and DALL E 2. 1 |
| Researcher Affiliation | Academia | Yizhuo Ma1, Shanmin Pang1, , Qi Guo1, Tianyu Wei1, Qing Guo2, 1 School of Software Engineering, Xi an Jiaotong University 2 IHPC and CFAR, Agency for Science, Technology and Research, Singapore {yizhuoma@stu., pangsm@, gq19990314@stu., Yangyy0318@stu.}xjtu.edu.cn, tsingqguo@ieee.org |
| Pseudocode | No | The paper describes the system components and their interactions using prose and diagrams (e.g., Figure 2), but it does not include any formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1Our code is available at https://github.com/tsingqguo/coljailbreak |
| Open Datasets | Yes | Datasets. To conduct a comprehensive evaluation of our proposed method, the sources of our datasets include both publicly available datasets and dataset we curated. We refer to the concepts of Inappropriate Image Prompts(I2P) dataset[36], an established dataset specifically designed for inappropriate prompts, focusing on harassment, violence, self-harm, shocking, and illegal activities. Specifically, we extract 105 prompts sourced from I2P and VBCDE-100 dataset[13], distributed across four categories: violence, self-harm, harassment and nudity. Additionally, we curate a dataset named the Unsafe Edit dataset, which including 100 inappropriate prompts. |
| Dataset Splits | No | The paper describes the datasets used (I2P, VBCDE-100, and a curated Unsafe Edit dataset), and mentions evaluating on them. However, it does not explicitly state the specific train/validation/test splits (e.g., percentages or sample counts) used for the experiments. |
| Hardware Specification | Yes | All experiments are performed using two NVIDIA A100 40GB GPUs. The overall duration of all the experiments in the paper was about six weeks. |
| Software Dependencies | No | The paper refers to various models and tools like Chat GPT[1], DALL E 2[3], GPT-4[5], Nude Net[6], Q16 classifier[37], SAM[20], CLIP, Fooocus-Inpainting[4], SD-Inpainting[7], and Controlnet-v1.1-sd1.5-Inpainting[2]. While it cites these, it does not provide specific version numbers for these software components or other general programming language/library dependencies (e.g., Python, PyTorch version). |
| Experiment Setup | Yes | In C.2 Details of Baselines, the paper specifies hyperparameters such as random seed (7867), number of optimization iterations (500), and number of adversarial prompts (10) for MMA-Diffusion. In C.3 Details of Defense Models, it provides configuration sets for SLD variants in Table 3. In C.4 Details of Col Jail Break, it lists hyperparameters for Fooocus-Inpainting including guidance scale (7.5), num inference steps (50), and strength (0.9999). |