Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Removing Concepts from Text-to-Image Models with Only Negative Samples

Authors: Hanwen Liu, Yadong Mu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on text-to-image show that Clipout is simple yet highly effective and efficient compared with previous state-of-the-art approaches. 4 Evaluation We assess Clipout on a variety of tasks and datasets: a) the face datasets Celeb A-HQ (Karras et al., 2018) and VGGFace2 (Cao et al., 2018); b) LAION-5B (Schuhmann et al., 2022). Metrics. For text-to-image tasks, the metric of CLIP Score (Hessel et al., 2021) is used to check if the generated images match the prompt that describes them. We use Face Detection Failure Rate (FDFR) (Deng et al., 2020) and Identity Score Matching (ISM) (Deng et al., 2022) to measure the generated face quality compared with the training data. Table 1: Numerical results on VGGFace2 (Cao et al., 2018).
Researcher Affiliation Academia Hanwen Liu, Yadong Mu Peking University EMAIL, EMAIL
Pseudocode Yes Algorithm 1 Unlearning the encoder via Clipout Parameter: prompt x and encoder fθ( ) 1: for it = 1, iteration do 2: Compute the data embedding z = fθ(x); 3: Sample minibatch of N masked samples from z; 4: Compute the contrastive loss ℓθ w.r.t. Eq. (1); 5: Update θ by descending the gradients: θℓθ; 6: end for 7: return unlearned encoder parameters θ ;
Open Source Code No Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: The data and codes are not released at submission time. For experimental result reproducibility, setting and details, please see Appendix B.
Open Datasets Yes We assess Clipout on a variety of tasks and datasets: a) the face datasets Celeb A-HQ (Karras et al., 2018) and VGGFace2 (Cao et al., 2018), where the adversary could use personalized methods, e.g., Textual Inversion (Gal et al., 2023), to make models remember personal concepts and forge fake photos; b) LAION-5B (Schuhmann et al., 2022), where Stable Diffusion (Rombach et al., 2022) is pre-trained on and we make use of this dataset to evaluate built-in concept unlearning. We also performed experiments on the inappropriate image prompts (I2P) dataset (Schramowski et al., 2023). We benchmark 50 diverse MS-COCO prompts (Lin et al., 2014) (e.g., a dog catching a frisbee ) to check whether Clipout degrades the model s overall generative quality on a broader prompt set.
Dataset Splits Yes For datasets, we choose the face datasets Celeb A-HQ (Karras et al., 2018) and VGGFace2 (Cao et al., 2018). These two face datasets are commonly used in deepfake or privacy related tasks (Liang et al., 2023; Le et al., 2023). We follow the same dataset split in the previous work (Le et al., 2023).
Hardware Specification Yes NVIDIA A40 with 48 GB GDDR6 memory is used to conduct most experiments.
Software Dependencies Yes We use Py Torch 1.13.1 with CUDA 11.6 on the Ubuntu operating system. All pre-trained weights are downloaded from the Hugging Face platform (Wolf et al., 2020).
Experiment Setup Yes We use Adam (Kingma and Ba, 2015) as the optimizer with a learning rate of 1.5 10 5 and perform unlearning for 200 epochs. The clipout rate is set as 0.25 by default. For diffusion models, we use Stable Diffusion 2.1 (Rombach et al., 2022), with CLIP (Radford et al., 2021; Cherti et al., 2023) as the text encoder. For numerical results, unless stated otherwise, we calculate these results w.r.t. different metrics based on 128 randomly generated images with 512 512 resolution for statistical significance.