DomainGallery: Few-shot Domain-driven Image Generation by Attribute-centric Finetuning
Authors: Yuxuan Duan, Yan Hong, Bo Zhang, jun lan, Huijia Zhu, Weiqiang Wang, Jianfu Zhang, Li Niu, Liqing Zhang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments are given to validate the superior performance of Domain Gallery on a variety of domain-driven generation scenarios. |
| Researcher Affiliation | Collaboration | Yuxuan Duan1 Yan Hong2 Bo Zhang1 Jun Lan2 Huijia Zhu2 Weiqiang Wang2 Jianfu Zhang1 Li Niu1 Liqing Zhang1 1Shanghai Jiao Tong University 2Ant Group |
| Pseudocode | No | The paper includes equations and diagrams of its pipeline, but no explicit pseudocode or algorithm blocks are provided. |
| Open Source Code | No | Codes of this work are in need of further polishing, and will be released if this paper is accepted. |
| Open Datasets | Yes | We test our method on five widely used 10-shot datasets, including CUFS sketches [45] ([N]: face), FFHQ sunglasses [21] ([N]: face), Van Gogh houses [30] ([N]: house), watercolor dogs [41] ([N]: dog) and wrecked cars [30] ([N]: car). |
| Dataset Splits | No | The paper uses 'training sets' and 'full sets' for evaluation but does not explicitly specify a distinct 'validation' dataset split for hyperparameter tuning. |
| Hardware Specification | Yes | All the experiments running Domain Gallery in this work are done on a single NVIDIA RTX 4090 GPU with 24GB VRAM. |
| Software Dependencies | No | The paper mentions Stable Diffusion v1.4 as the base model and uses libraries like LoRA of PEFT and 8bit Adam, but does not explicitly list specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | For prior attribute erasure, we train the model for 500 steps, with batch size 4 and learning rate 1e-4. While for finetuning, we initialize Lo RA with the parameters ϕ where prior attributes of the identifier [V] are erased, and train the model for 1,000 steps, with batch size 4 and learning rate 5e-5. When generating images during inference period, we apply DDIM [42] scheduler with 50 steps and scale of CFG [15] λ1 = 7.5. |