Data Attribution for Text-to-Image Models by Unlearning Synthesized Images
Authors: Sheng-Yu Wang, Aaron Hertzmann, Alexei Efros, Jun-Yan Zhu, Richard Zhang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method with a computationally intensive but gold-standard retraining from scratch and demonstrate our method s advantages over previous methods. Our experiments show that our algorithm outperforms prior work on both benchmarks, demonstrating that unlearning synthesized images is an effective way to attribute training images. |
| Researcher Affiliation | Collaboration | 1Carnegie Mellon University 2Adobe Research 3UC Berkeley |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at: https://peterwang512.github.io/ Attribute By Unlearning. |
| Open Datasets | Yes | We use MSCOCO [25] ( 100k images), which allows for retraining models within a reasonable compute budget. |
| Dataset Splits | No | The paper mentions using the MSCOCO 2017 training split and text prompts from the MSCOCO validation set for evaluation, but it does not specify explicit training/validation/test dataset splits with percentages or sample counts for its own model training. |
| Hardware Specification | Yes | We conduct all of our experiments on A100 GPUs. |
| Software Dependencies | No | The paper does not provide specific version numbers for general software dependencies (e.g., Python, PyTorch, CUDA) used in their experimental setup, only mentioning specific models like "Stable Diffusion v2" or "Vi T-B/32". |
| Experiment Setup | Yes | To retrain each MSCOCO model for leave-K-out evaluation, we follow the same training recipe as the source model, where each model is trained with 200 epochs, a learning rate of 10 4, and a batch size of 128. To unlearn a synthesized sample in MSCOCO models, we find that running with 1 step already yields good attribution performance. We perform Newton unlearning updates with step sizes of 0.01 and update only cross-attention KV (W k, W v). |