reproducibilityindex.ai

Finding NeMo: Localizing Neurons Responsible For Memorization in Diffusion Models

Authors: Dominik Hintersdorf, Lukas Struppek, Kristian Kersting, Adam Dziedzic, Franziska Boenisch

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through our experiments, we make the intriguing finding that in many cases, single neurons are responsible for memorizing particular training samples.
Researcher Affiliation	Academia	Dominik Hintersdorf 1,2 Lukas Struppek 1,2 Kristian Kersting1,2,3,4 Adam Dziedzic5 Franziska Boenisch5 1German Research Center for Artificial Intelligence (DFKI) 2Computer Science Department, Technical University of Darmstadt 3Hessian Center for AI (Hessian.AI) 4Centre for Cognitive Science, Technical University of Darmstadt 5CISPA Helmholtz Center for Information Security
Pseudocode	Yes	Alg. 1 defines our algorithm to compute the differences between the initial noise samples and the noise predicted during the first denoising step.
Open Source Code	Yes	Code: https://github.com/ml-research/localizing_memorization_in_diffusion_models
Open Datasets	Yes	Our set of memorized prompts consists of 500 LAION prompts [35] provided by Wen et al. [46].
Dataset Splits	Yes	We set the memorization score threshold to τmem = 0.428, which corresponds to the mean plus one standard deviation of the pairwise SSIM score between initial noise differences measured on a holdout dataset of 50,000 LAION [35] prompts.
Hardware Specification	Yes	We performed all our experiments on NVIDIA DGX machines running NVIDIA DGX Server Version 5.2.0 and Ubuntu 20.04.5 LTS. The machines have 1.5 TB (machine 1) and 2 TB (machine 2) of RAM and contain NVIDIA Tesla V100 SXM3 32GB (machine 1) NVIDIA A100-SXM4-40GB (machine 2) GPUs with Intel(R) Xeon(R) Platinum 8174 (machine 1) and AMD EPYC 7742 64-core (machine 2) CPUs.
Software Dependencies	Yes	We further relied on CUDA 12.1, Python 3.10.13, and Py Torch 2.2.2 with Torchvision 0.17.2 [27] for our experiments. All investigated models are publicly available on Hugging Face. For access, we used the Hugging Face diffusers library with version 0.27.1.
Experiment Setup	Yes	All images depicted throughout the paper are generated with fixed seeds, 50 inference steps, and a classifier-free guidance strength of 7 using the default DDIM scheduler.