TexFit: Text-Driven Fashion Image Editing with Diffusion Models

Authors: Tongxin Wang, Mang Ye

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on the DFMM-Spotlight dataset demonstrate the effectiveness of our model. Code and Datasets are available at https://texfit.github.io/.
Researcher Affiliation Academia Tongxin Wang, Mang Ye* National Engineering Research Center for Multimedia Software, Institute of Artificial Intelligence Hubei Key Laboratory of Multimedia and Network Communication Engineering School of Computer Science, Hubei Luojia Laboratory, Wuhan University, Wuhan, China {wangtx, yemang}@whu.edu.cn
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Code and Datasets are available at https://texfit.github.io/.
Open Datasets Yes Therefore, we develop a new DFMM-Spotlight dataset by using region extraction and attribute combination strategies. It focuses locally on clothes and accessories, enabling local editing with text input. Code and Datasets are available at https://texfit.github.io/. We will make this dataset publicly available and hope that it can aid in the investigation of techniques for the task of local fashion image editing.
Dataset Splits No We split the DFMM-Spotlight dataset into a training set with 21377 image-region-text pairs and a test set with 2379 pairs following the original split setting in the Deep Fashion Multi Modal dataset. No explicit mention of a separate validation set or its size was found.
Hardware Specification Yes All experiments are performed on a single NVIDIA RTX 3090.
Software Dependencies Yes We employ Stable Diffusion v1.4 as the pre-trained model for our second-stage fashion image editing module and initialize additional channel weights after restoring the non-inpainting checkpoint.
Experiment Setup Yes ERLM is trained on DFMM-Spotlight for 100 epochs with a batch size of 8, adopting the Adam optimizer (Kingma and Ba 2015) and the learning rate is set as 1 10 4. We finetune it for 140k steps on the DFMM-Spotlight dataset, using the Adam W optimizer (Loshchilov and Hutter 2018) and setting the learning rate to 1 10 5. To save memory, we adopt the strategy of mixed precision (Micikevicius et al. 2018) and gradient accumulation, where the steps for gradient accumulation is set to 4 and the batch size is set to 1. For inference, we employ the PNDM scheduler (Liu et al. 2021) with 50 steps of iteration and set the classifier-free guidance scale w to 7.5.