TexFit: Text-Driven Fashion Image Editing with Diffusion Models
Authors: Tongxin Wang, Mang Ye
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on the DFMM-Spotlight dataset demonstrate the effectiveness of our model. Code and Datasets are available at https://texfit.github.io/. |
| Researcher Affiliation | Academia | Tongxin Wang, Mang Ye* National Engineering Research Center for Multimedia Software, Institute of Artificial Intelligence Hubei Key Laboratory of Multimedia and Network Communication Engineering School of Computer Science, Hubei Luojia Laboratory, Wuhan University, Wuhan, China {wangtx, yemang}@whu.edu.cn |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code and Datasets are available at https://texfit.github.io/. |
| Open Datasets | Yes | Therefore, we develop a new DFMM-Spotlight dataset by using region extraction and attribute combination strategies. It focuses locally on clothes and accessories, enabling local editing with text input. Code and Datasets are available at https://texfit.github.io/. We will make this dataset publicly available and hope that it can aid in the investigation of techniques for the task of local fashion image editing. |
| Dataset Splits | No | We split the DFMM-Spotlight dataset into a training set with 21377 image-region-text pairs and a test set with 2379 pairs following the original split setting in the Deep Fashion Multi Modal dataset. No explicit mention of a separate validation set or its size was found. |
| Hardware Specification | Yes | All experiments are performed on a single NVIDIA RTX 3090. |
| Software Dependencies | Yes | We employ Stable Diffusion v1.4 as the pre-trained model for our second-stage fashion image editing module and initialize additional channel weights after restoring the non-inpainting checkpoint. |
| Experiment Setup | Yes | ERLM is trained on DFMM-Spotlight for 100 epochs with a batch size of 8, adopting the Adam optimizer (Kingma and Ba 2015) and the learning rate is set as 1 10 4. We finetune it for 140k steps on the DFMM-Spotlight dataset, using the Adam W optimizer (Loshchilov and Hutter 2018) and setting the learning rate to 1 10 5. To save memory, we adopt the strategy of mixed precision (Micikevicius et al. 2018) and gradient accumulation, where the steps for gradient accumulation is set to 4 and the batch size is set to 1. For inference, we employ the PNDM scheduler (Liu et al. 2021) with 50 steps of iteration and set the classifier-free guidance scale w to 7.5. |