Direct Unlearning Optimization for Robust and Safe Text-to-Image Models
Authors: Yong-Hyun Park, Sangdoo Yun, Jin-Hwa Kim, Junho Kim, Geonhui Jang, Yonghyun Jeong, Junghyo Jo, Gayoung Lee
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that DUO can robustly defend against various state-of-the-art red teaming methods without significant performance degradation on unrelated topics, as measured by FID and CLIP scores. |
| Researcher Affiliation | Collaboration | 1Department of Physics Education, Seoul National University 2School of Industrial and Management Engineering, Korea University 3NAVER AI Lab 4NAVER Cloud 5Korea Institute for Advanced Study (KIAS) 6AI Institute of Seoul National University or SNU AIIS |
| Pseudocode | No | The paper does not include a pseudocode or algorithm block. |
| Open Source Code | No | We will publicly open the source-code for reproducible. |
| Open Datasets | Yes | To evaluate model performance unrelated to the unlearned concept, we measure the FID [19] and CLIP scores [18] using MS COCO 30k validation dataset [30]. |
| Dataset Splits | Yes | To evaluate model performance unrelated to the unlearned concept, we measure the FID [19] and CLIP scores [18] using MS COCO 30k validation dataset [30]. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models used for experiments. |
| Software Dependencies | No | The paper mentions 'Stable Diffuson 1.4v' and 'Lo RA', 'Adam optimizer', but does not provide specific version numbers for key software dependencies like deep learning frameworks (e.g., PyTorch, TensorFlow) or other libraries beyond 'Python 3.8' from the checklist. |
| Experiment Setup | Yes | We use Stable Diffuson 1.4v (SD1.4v) with a Lo RA [23, 48] rank of 32 with the Adam optimizer for fine-tuning. For generating the unsafe images x , we use naked as prompt with a guidance strength of 7.5. When we use SDEdit, the magnitude of the added noise is t = 0.75T, where T is the maximum diffusion timesteps, and the guidance scale used is 7.5. Using β = 100 as a baseline, we use a learning rate of 3 10 4, the batch size of 4, and Lo RA rank of 32. |