Navigating Text-To-Image Customization: From LyCORIS Fine-Tuning to Model Evaluation

Authors: SHIH-YING YEH, Yu-Guan Hsieh, Zhidong Gao, Bernard B W Yang, Giyeong Oh, Yanmin Gong

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we perform extensive experiments to compare different Ly CORIS algorithms and to assess the impact of the hyperparameters.
Researcher Affiliation Collaboration Shih-Ying Yeh* Yu-Guan Hsieh* Zhidong Gao* NTHU Apple UTSA ay@kblueleaf.net cyberhsieh212@gmail.com zhidong.gao@utsa.edu Bernard B W Yang Giyeong Oh Yanmin Gong University of Toronto Yonsei University UTSA by3976@gmail.com hard2251@yonsei.ac.kr gongyanmin@gmail.com
Pseudocode No The paper does not include any figure, block, or section explicitly labeled 'Pseudocode' or 'Algorithm', nor does it present structured steps formatted like code.
Open Source Code Yes Addressing these issues, this paper introduces Ly CORIS (Lora be Yond Conventional methods, Other Rank adaptation Implementations for Stable diffusion), an open-source library that offers a wide selection of fine-tuning methodologies for Stable Diffusion.
Open Datasets Yes Specifically, our work is based on Stable Diffusion, a text-to-image latent diffusion model (Rombach et al., 2022) pretrained on the LAION 5-billion image dataset (Schuhmann et al., 2022).
Dataset Splits No The paper does not explicitly provide specific percentages, sample counts, or a clear methodology for splitting the dataset into training, validation, and test sets. It mentions training using random seeds and saving checkpoints, but no distinct validation set for tuning is described.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to conduct the experiments.
Software Dependencies No The paper does not specify the version numbers of any key software components or libraries (e.g., Python, PyTorch, CUDA, or even the Ly CORIS library itself).
Experiment Setup Yes For each of these four algorithms, we define a set of default hyperparameters and then individually vary one of the following hyperparameters: learning rate, trained layers, dimension and alpha for Lo RA and Lo Ha, and factor for Lo Kr. This leads to 26 distinct configurations. We consider three levels of learning rate, 5 × 10−7, 10−6, and 5 × 10−6 for native fine-tuning, and 10−4, 5 × 10−4, and 10−3 for the other three algorithms. To investigate the effects of fine-tuning different layers, we examine three distinct presets: i) attn-only: where we only fine-tune attention layers; ii) attn-mlp: where we fine-tune both attention and feedforward layers; and iii) full network: where we fine-tune all the layers, including the convolutional ones. By default, we set the dimension and alpha of Lo RA to 8 and 4, and of Lo Ha to 4 and 2. As for Lo Kr, we set the factor to 8 and do not perform further decomposition of the second block.