Direct Unlearning Optimization for Robust and Safe Text-to-Image Models

Authors: Yong-Hyun Park, Sangdoo Yun, Jin-Hwa Kim, Junho Kim, Geonhui Jang, Yonghyun Jeong, Junghyo Jo, Gayoung Lee

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that DUO can robustly defend against various state-of-the-art red teaming methods without significant performance degradation on unrelated topics, as measured by FID and CLIP scores.
Researcher Affiliation Collaboration 1Department of Physics Education, Seoul National University 2School of Industrial and Management Engineering, Korea University 3NAVER AI Lab 4NAVER Cloud 5Korea Institute for Advanced Study (KIAS) 6AI Institute of Seoul National University or SNU AIIS
Pseudocode No The paper does not include a pseudocode or algorithm block.
Open Source Code No We will publicly open the source-code for reproducible.
Open Datasets Yes To evaluate model performance unrelated to the unlearned concept, we measure the FID [19] and CLIP scores [18] using MS COCO 30k validation dataset [30].
Dataset Splits Yes To evaluate model performance unrelated to the unlearned concept, we measure the FID [19] and CLIP scores [18] using MS COCO 30k validation dataset [30].
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models used for experiments.
Software Dependencies No The paper mentions 'Stable Diffuson 1.4v' and 'Lo RA', 'Adam optimizer', but does not provide specific version numbers for key software dependencies like deep learning frameworks (e.g., PyTorch, TensorFlow) or other libraries beyond 'Python 3.8' from the checklist.
Experiment Setup Yes We use Stable Diffuson 1.4v (SD1.4v) with a Lo RA [23, 48] rank of 32 with the Adam optimizer for fine-tuning. For generating the unsafe images x , we use naked as prompt with a guidance strength of 7.5. When we use SDEdit, the magnitude of the added noise is t = 0.75T, where T is the maximum diffusion timesteps, and the guidance scale used is 7.5. Using β = 100 as a baseline, we use a learning rate of 3 10 4, the batch size of 4, and Lo RA rank of 32.