ART: Automatic Red-teaming for Text-to-Image Models to Protect Benign Users

Authors: Guanlin Li, Kangjie Chen, Shudong Zhang, Jie Zhang, Tianwei Zhang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental With our comprehensive experiments, we reveal the toxicity of the popular open-source text-to-image models. The experiments also validate the effectiveness, adaptability, and great diversity of ART.
Researcher Affiliation Academia Guanlin Li1, Kangjie Chen1, , Shudong Zhang2, Jie Zhang3, Tianwei Zhang1 1Nanyang Technological University, 2Xidian University,3CFAR and IHPC, A*STAR.
Pseudocode No The paper describes the methodology in text and provides figures like
Open Source Code Yes Datasets and models can be found in https://github.com/Guanlin Lee/ART.
Open Datasets Yes Additionally, we introduce three large-scale red-teaming datasets for studying the safety risks associated with text-to-image models. Datasets and models can be found in https://github.com/Guanlin Lee/ART.
Dataset Splits No For LD, we adopt the Guide Model to generate 31,086 data items for the training set and 1,646 data for the test set.
Hardware Specification Yes We adopt 4 RTX A6000 (48GB) to fine-tune these models. We adopt 4 RTX A6000 during the inference phase. The Judge Models share one GPU. For the Writer Model, the Guide Model, and the T2I Model, each one occupies one GPU.
Software Dependencies No The paper mentions specific models like
Experiment Setup Yes If there are no special instructions, we set the guidance scale as 7.5 and use the default settings for other hyperparameters based on diffusers [45]. All training details can be found in Appendix F.