reproducibilityindex.ai

DomainGallery: Few-shot Domain-driven Image Generation by Attribute-centric Finetuning

Authors: Yuxuan Duan, Yan Hong, Bo Zhang, jun lan, Huijia Zhu, Weiqiang Wang, Jianfu Zhang, Li Niu, Liqing Zhang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments are given to validate the superior performance of Domain Gallery on a variety of domain-driven generation scenarios.
Researcher Affiliation	Collaboration	Yuxuan Duan1 Yan Hong2 Bo Zhang1 Jun Lan2 Huijia Zhu2 Weiqiang Wang2 Jianfu Zhang1 Li Niu1 Liqing Zhang1 1Shanghai Jiao Tong University 2Ant Group
Pseudocode	No	The paper includes equations and diagrams of its pipeline, but no explicit pseudocode or algorithm blocks are provided.
Open Source Code	No	Codes of this work are in need of further polishing, and will be released if this paper is accepted.
Open Datasets	Yes	We test our method on five widely used 10-shot datasets, including CUFS sketches [45] ([N]: face), FFHQ sunglasses [21] ([N]: face), Van Gogh houses [30] ([N]: house), watercolor dogs [41] ([N]: dog) and wrecked cars [30] ([N]: car).
Dataset Splits	No	The paper uses 'training sets' and 'full sets' for evaluation but does not explicitly specify a distinct 'validation' dataset split for hyperparameter tuning.
Hardware Specification	Yes	All the experiments running Domain Gallery in this work are done on a single NVIDIA RTX 4090 GPU with 24GB VRAM.
Software Dependencies	No	The paper mentions Stable Diffusion v1.4 as the base model and uses libraries like LoRA of PEFT and 8bit Adam, but does not explicitly list specific version numbers for these or other software dependencies.
Experiment Setup	Yes	For prior attribute erasure, we train the model for 500 steps, with batch size 4 and learning rate 1e-4. While for finetuning, we initialize Lo RA with the parameters ϕ where prior attributes of the identifier [V] are erased, and train the model for 1,000 steps, with batch size 4 and learning rate 5e-5. When generating images during inference period, we apply DDIM [42] scheduler with 50 steps and scale of CFG [15] λ1 = 7.5.