reproducibilityindex.ai

Circumventing Concept Erasure Methods For Text-To-Image Generative Models

Authors: Minh Pham, Kelly O. Marshall, Niv Cohen, Govind Mittal, Chinmay Hegde

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we examine seven recently proposed concept erasure methods, and show that targeted concepts are not fully excised from any of these methods. Specifically, we devise an algorithm to learn special input word embeddings that can retrieve erased concepts from the sanitized models with no alterations to their weights. Our results highlight the brittleness of post hoc concept erasure methods, and call into question their use in the algorithmic toolkit for AI safety.
Researcher Affiliation	Academia	Minh Pham, Kelly O. Marshall, Niv Cohen, Govind Mittal & Chinmay Hegde New York University {mp5847, km3888, nc3468, mittal, chinmay.h}@nyu.edu
Pseudocode	Yes	The pseudocode for our CI scheme can be found in Appendix C.
Open Source Code	Yes	Our code is available for reproducibility purposes at https://nyu-dice-lab.github.io/CCE/
Open Datasets	Yes	We investigate the Imagenette (Howard, 2019) dataset... the I2P dataset comprises 4703 unique prompts... We conducted a more controlled study by training two diffusion models on MNIST (Le Cun et al., 2010) from scratch.
Dataset Splits	No	The paper mentions using a small number of samples for training its Concept Inversion algorithm (e.g., '6 samples for art style concept, 30 samples for object concept, and 25 samples for ID concept'), but it does not specify explicit training/validation/test dataset splits with percentages, sample counts, or references to predefined splits for general model training.
Hardware Specification	Yes	For all our Concept Inversion experiments, unless mentioned otherwise, we perform training on one A100 GPU with a batch size of 4, and a learning rate of 5e 03.
Software Dependencies	No	The paper mentions software like 'Stable Diffusion 1.4', 'Nude Net', 'Res Net-50 Imagenet classifier', 'GIPHY celebrity detector', and 'CLIP Retrieval' but does not provide specific version numbers for these components.
Experiment Setup	Yes	For all our Concept Inversion experiments, unless mentioned otherwise, we perform training on one A100 GPU with a batch size of 4, and a learning rate of 5e 03. We optimize the word embedding for 1,000 steps while keeping the weights of the erased models frozen. The CI training procedure is the same across erasure methods and concepts, except for ID concepts we optimize for 5,000 steps, and for SLD we we train for 1,000 steps with batch size 1.