Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

When Are Concepts Erased From Diffusion Models?

Authors: Kevin Lu, Nicky Kriplani, Rohit Gandikota, Minh Pham, David Bau, Chinmay Hegde, Niv Cohen

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our results shed light on the value of exploring concept erasure robustness outside of adversarial text inputs, and emphasize the importance of comprehensive evaluations for erasure in diffusion models1. We present our evaluation suite and apply it to a representative set of existing erasure methods. Our findings reveal undiscovered behavior of models under these new evaluation contexts. For instance, models that appear robust under traditional input search techniques remain vulnerable when assessed from other perspectives. These observations emphasize the critical need for a comprehensive suite of evaluations, like the one we propose, to reliably assess the completeness and true effectiveness of any concept erasure method.
Researcher Affiliation Academia Kevin Lu1 Nicky Kriplani2 Rohit Gandikota1 Minh Pham2 David Bau1 Chinmay Hegde2 Niv Cohen2 1Northeastern University 2New York University
Pseudocode No The paper describes methods using equations (e.g., Eq. 1, 2, 3, 4) and textual descriptions but does not include any distinct pseudocode or algorithm blocks.
Open Source Code Yes Source code and datasets can be found at kevinlu4588/When Are Concepts Erased.
Open Datasets Yes Source code and datasets can be found at kevinlu4588/When Are Concepts Erased. We construct binary datasets per concept from Image Net-1k as follows: Positives: all samples belonging to the target Image Net class (resolved via the label name lookup in the HF metadata).
Dataset Splits Yes The split is 90% train / 10% validation.
Hardware Specification Yes To train all the models, run the entire evaluation suite, and create the CLIP and classificatio metrics, we used two NVIDIA A6000 GPUs.
Software Dependencies Yes All similarity assessments were performed using CLIP Vi T (openai/clip-vit-base-patch32). To assess whether erased concepts remain recognizable in generated images, we perform classification using a Res Net-50 model pretrained on the Imagenette dataset. The inpainting pipeline was based on Stable Diffusion 1.5 and implemented via Hugging Face s Stable Diffusion Inpaint Pipeline. Images are mapped to Stable Diffusion latents using the SD v1.4 VAE (Autoencoder KL). We train with Adam W (lr 1 10 4, weight decay 10 3), batch size 8, gradient clipping at 1.0, for 10 epochs by default. The loss is BCEWith Logits.
Experiment Setup Yes For the English Springer Spaniel and Garbage Truck concepts, we reduced the number of fine-tuning steps to 10, while using 60 steps for all other concepts. To prevent degradation of the model s general utility, a known issue when applying GA over extended training, we adopt a conservative training configuration: a batch size of 5, gradient accumulation steps of 4, and a learning rate of 1 10 5. ESD-x & ESD-u: We fine-tuned for 200 steps using a learning rate of 2 10 5. UCE: We fine-tuned for 200 steps with an empty guiding concept and an erase scale of 1. Task Vector (TV): To get the fine-tuned model for computing task vectors, we fine-tuned each model on 500 images for 200 steps, using a learning rate of 1 10 5. We used batch size of 4 and gradient accumulation step of 4. For erasure, we set the editing strength α = 1.75. Textual Inversion: Training involved 100 images, optimized for 3000 steps using a learning rate of 5 10 4. Unlearn Diff Atk: The model was trained using a learning rate of 0.01 and a weight decay of 0.1, with the classifier parameter set to K = 3. Inference-Time Noising Probe: We searched over an evenly spaced set of 6 η values between 1.0 and 1.85: [1.0, 1.17, 1.34, 1.51, 1.68, 1.85]. Classifier Guidance Implementation: We train with Adam W (lr 1 10 4, weight decay 10 3), batch size 8, gradient clipping at 1.0, for 10 epochs by default. During sampling, we run the standard classifier-free guidance (CFG) pass to obtain ϵcfg and then inject the latent-classifier gradient... with guidance scale 7.5. During inference, we sweep over 24 values of sclf and select the sample with the highest classification score for the target concept.