FLDM-VTON: Faithful Latent Diffusion Model for Virtual Try-on
Authors: Chenhui Wang, Tao Chen, Zhihao Chen, Zhizhong Huang, Taoran Jiang, Qi Wang, Hongming Shan
IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results on the benchmark VITON-HD and Dress Code datasets demonstrate that our FLDM-VTON outperforms state-of-the-art baselines and is able to generate photo-realistic try-on images with faithful clothing details. |
| Researcher Affiliation | Collaboration | 1 Institute of Science and Technology for Brain-inspired Intelligence, MOE Frontiers Center for Brain Science, and Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China 2 School of Computer Science, Fudan University, Shanghai 200433, China 3 Suzhou Xiangji Technology Service Co., Ltd., Suzhou 215223, China 4 Shanghai Center for Brain Science and Brain-inspired Technology, Shanghai 200031, China |
| Pseudocode | No | The paper describes its methods in text and figures but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository. |
| Open Datasets | Yes | We conduct experiments on two popular high-resolution VTON benchmarks: the VITON-HD dataset [Choi et al., 2021] and Dress Code dataset [Morelli et al., 2022]. |
| Dataset Splits | No | The paper states: 'We follow the official guidelines to divide the data into training and testing sets [Choi et al., 2021; Morelli et al., 2022].' It specifies training and testing sets but does not explicitly mention a validation set or its split details. |
| Hardware Specification | Yes | We adopt Adam optimizer to optimize all networks with a mini-batch size of 8 and a learning rate of 2.0 × 10−5 on 4 NVIDIA V100 GPUs. |
| Software Dependencies | No | The paper mentions software components like 'Adam optimizer', 'SD KL-regularized auto-encoder', 'DPM solver', and 'Free U', but it does not provide specific version numbers for any of these to ensure reproducibility of the software environment. |
| Experiment Setup | Yes | We adopt Adam optimizer to optimize all networks with a mini-batch size of 8 and a learning rate of 2.0 × 10−5 on 4 NVIDIA V100 GPUs. In addition, we employ the encoder and decoder of SD KL-regularized auto-encoder, with a down-sampling factor of d = 8 and a latent channel number of c = 4, as our encoder E and decoder D, respectively. We set T = 1, 000 for latent diffusion training as suggested by SD [Lee et al., 2022], and use the DPM solver [Lu et al., 2022] with 50 sampling steps for inference. |