DreamSteerer: Enhancing Source Image Conditioned Editability using Personalized Diffusion Models
Authors: Zhengyang Yu, Zhaoyuan Yang, Jing Zhang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments validate that Dream Steerer can significantly improve the editability of several T2I personalization baselines while being computationally efficient. Project page: https://github.com/Dijkstra14/Dream Steerer. |
| Researcher Affiliation | Collaboration | Zhengyang Yu1 Zhaoyuan Yang2 Jing Zhang1 Australian National University1 GE Research2 |
| Pseudocode | No | The paper describes its methods through text, equations, and diagrams, but it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Project page: https://github.com/Dijkstra14/Dream Steerer. |
| Open Datasets | Yes | We use the pre-trained checkpoints provided by Dream Matcher [50], with 16 concepts for each baseline encompassing living, non-living, in-door, and outdoor subjects. The pre-trained checkpoints provided by Vi Co [20] have 16 concepts... Textual Inversion dataset: https://github.com/rinongal/textual_inversion Dream Booth dataset: https://dreambooth.github.io/ Custom Diffusion dataset: https://www.cs.cmu.edu/~custom-diffusion/. |
| Dataset Splits | No | The paper does not explicitly provide training, validation, and test dataset splits for the fine-tuning process. It mentions the number of images used for evaluation but not the splits for training their models. |
| Hardware Specification | Yes | Experiments run on a single NVIDIA RTX3090 GPU take a fine-tuning time of around 1 minute for a batch size of 1. |
| Software Dependencies | No | The paper mentions using "Adam W" optimizer, "latent space diffusion model", and "Stable-Diffusion-v-1-4" but does not provide specific version numbers for these or other software components like programming languages (e.g., Python), libraries (e.g., PyTorch), or CUDA. |
| Experiment Setup | Yes | For fine-tuning with each baseline, we employ the same set of trainable parameters as the original personalization process. We use an Adam W [46] optimizer for all baselines; the learning rates are set as 1e-3, 1e-6, and 5e-5 for Textual Inversion, Dream Booth, and Custom Diffusion respectively. The total optimization step is set to 10 with 10 cumulative gradient steps. ... the CFG is set as 1 for the inversion process, 3.5 for the sampling process, and negative prompts "oversaturated color, ugly, tiling, low quality, noisy" are employed to replace the null text token. We set the early stopping step as tearly = 30. |