Interpreting the Weight Space of Customized Diffusion Models
Authors: Amil Dravid, Yossi Gandelsman, Kuan-Chieh Wang, Rameen Abdal, Gordon Wetzstein, Alexei Efros, Kfir Aberman
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We investigate the space of weights spanned by a large collection of customized diffusion models. We populate this space by creating a dataset of over 60,000 models, each of which is a base model fine-tuned to insert a different person s visual identity. We model the underlying manifold of these weights as a subspace, which we term weights2weights. We demonstrate three immediate applications of this space that result in new diffusion models sampling, editing, and inversion. First, sampling a set of weights from this space results in a new model encoding a novel identity. Next, we find linear directions in this space corresponding to semantic edits of the identity (e.g., adding a beard), resulting in a new model with the original identity edited. Finally, we show that inverting a single image into this space encodes a realistic identity into a model, even if the input image is out of distribution (e.g., a painting). We further find that these linear properties of the diffusion model weight space extend to other visual concepts. Our results indicate that the weight space of fine-tuned diffusion models can behave as an interpretable meta-latent space producing new models.1 |
| Researcher Affiliation | Collaboration | Amil Dravid 1,2 Yossi Gandelsman 1 Kuan-Chieh Wang2 Rameen Abdal3 Gordon Wetzstein3 Alexei A. Efros1 Kfir Aberman2 1UC Berkeley 2Snap Inc. 3Stanford University |
| Pseudocode | No | The paper describes methods and uses mathematical equations, but it does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | Code: https://github.com/snap-research/weights2weights |
| Open Datasets | Yes | We generate a synthetic dataset of 65,000 identities using [67], where each identity is associated with multiple images of that person. Each identity is based on an image with labeled binary attributes (e.g., male/female) from Celeb A [36]. |
| Dataset Splits | Yes | Further details on this dataset and train/test splits are provided in Appendix E. [...] For evaluating identity edits from Sec. 4.3, we hold out 100 identities, which results in leaving out 1000 models since multiple models may encode different instances of the same identity. |
| Hardware Specification | Yes | Table 4: Inversion into w2w space balances identity preservation and efficiency. [...] We measure the training time on a single NVIDIA A100 GPU. |
| Software Dependencies | No | The paper mentions 'Stable Diffusion 1.5' and the use of 'Adam' optimizer and 'Hugging Face' implementation, but does not provide specific version numbers for core software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | We conduct Dreambooth fine-tuning using Lo RA with rank 1 on the identities. [...] We run Principal Component Analysis (PCA) on the 65,000 training models and project to the first 1000 principal components [...]. We optimize for 400 epochs, using Adam [30] with learning rate 0.1, β1 = 0.9, β2 = 0.999 and with weight decay factor 1e-10. |