reproducibilityindex.ai

DreamSteerer: Enhancing Source Image Conditioned Editability using Personalized Diffusion Models

Authors: Zhengyang Yu, Zhaoyuan Yang, Jing Zhang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments validate that Dream Steerer can significantly improve the editability of several T2I personalization baselines while being computationally efficient. Project page: https://github.com/Dijkstra14/Dream Steerer.
Researcher Affiliation	Collaboration	Zhengyang Yu1 Zhaoyuan Yang2 Jing Zhang1 Australian National University1 GE Research2
Pseudocode	No	The paper describes its methods through text, equations, and diagrams, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Project page: https://github.com/Dijkstra14/Dream Steerer.
Open Datasets	Yes	We use the pre-trained checkpoints provided by Dream Matcher [50], with 16 concepts for each baseline encompassing living, non-living, in-door, and outdoor subjects. The pre-trained checkpoints provided by Vi Co [20] have 16 concepts... Textual Inversion dataset: https://github.com/rinongal/textual_inversion Dream Booth dataset: https://dreambooth.github.io/ Custom Diffusion dataset: https://www.cs.cmu.edu/~custom-diffusion/.
Dataset Splits	No	The paper does not explicitly provide training, validation, and test dataset splits for the fine-tuning process. It mentions the number of images used for evaluation but not the splits for training their models.
Hardware Specification	Yes	Experiments run on a single NVIDIA RTX3090 GPU take a fine-tuning time of around 1 minute for a batch size of 1.
Software Dependencies	No	The paper mentions using "Adam W" optimizer, "latent space diffusion model", and "Stable-Diffusion-v-1-4" but does not provide specific version numbers for these or other software components like programming languages (e.g., Python), libraries (e.g., PyTorch), or CUDA.
Experiment Setup	Yes	For fine-tuning with each baseline, we employ the same set of trainable parameters as the original personalization process. We use an Adam W [46] optimizer for all baselines; the learning rates are set as 1e-3, 1e-6, and 5e-5 for Textual Inversion, Dream Booth, and Custom Diffusion respectively. The total optimization step is set to 10 with 10 cumulative gradient steps. ... the CFG is set as 1 for the inversion process, 3.5 for the sampling process, and negative prompts "oversaturated color, ugly, tiling, low quality, noisy" are employed to replace the null text token. We set the early stopping step as tearly = 30.