reproducibilityindex.ai

Hollowed Net for On-Device Personalization of Text-to-Image Diffusion Models

Authors: Wonguk Cho, Seokeon Choi, Debasmit Das, Matthias Reisser, Taesup Kim, Sungrack Yun, Fatih Porikli

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Quantitative and qualitative analyses demonstrate that our approach not only reduces training memory to levels as low as those required for inference but also maintains or improves personalization performance compared to existing methods. 5 Experiments
Researcher Affiliation	Collaboration	1Qualcomm AI Research 2Seoul National University
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The code would be available after an internal review process has been completed (not available at this submission time).
Open Datasets	Yes	We use a total of 131 subjects for experiments, utilizing both the Dream Booth [5] and Custom Concept101 [7] datasets.
Dataset Splits	No	The paper describes using Dream Booth and Custom Concept101 datasets for experiments and fine-tuning but does not explicitly provide training/validation/test dataset splits within its text.
Hardware Specification	No	The paper frequently mentions 'GPU memory' and 'computational resources' and discusses memory usage in GB (e.g., '3.88GB of GPU memory usage'), but it does not specify any particular GPU or CPU models used for the experiments.
Software Dependencies	No	The paper mentions 'Adam W optimizer' and 'Stable Diffusion v2.1 diffusion model', but it does not specify any software dependencies with version numbers.
Experiment Setup	Yes	Following Dream Booth [5], we use a prior preservation loss with 1000 pre-generated class samples. Lo RA [13] is applied for the cross and self-attention layers and fine-tuned for 1000 steps. We use Adam W optimizer with the learning rate of 1e-5 for full-finetuning and 1e-4 for the others. Assuming a resource-constrained environment, we use a batch size of 1 and do not update the pre-trained text encoder, while text embeddings are pre-computed before fine-tuning.