FLDM-VTON: Faithful Latent Diffusion Model for Virtual Try-on

Authors: Chenhui Wang, Tao Chen, Zhihao Chen, Zhizhong Huang, Taoran Jiang, Qi Wang, Hongming Shan

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results on the benchmark VITON-HD and Dress Code datasets demonstrate that our FLDM-VTON outperforms state-of-the-art baselines and is able to generate photo-realistic try-on images with faithful clothing details.
Researcher Affiliation Collaboration 1 Institute of Science and Technology for Brain-inspired Intelligence, MOE Frontiers Center for Brain Science, and Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China 2 School of Computer Science, Fudan University, Shanghai 200433, China 3 Suzhou Xiangji Technology Service Co., Ltd., Suzhou 215223, China 4 Shanghai Center for Brain Science and Brain-inspired Technology, Shanghai 200031, China
Pseudocode No The paper describes its methods in text and figures but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository.
Open Datasets Yes We conduct experiments on two popular high-resolution VTON benchmarks: the VITON-HD dataset [Choi et al., 2021] and Dress Code dataset [Morelli et al., 2022].
Dataset Splits No The paper states: 'We follow the official guidelines to divide the data into training and testing sets [Choi et al., 2021; Morelli et al., 2022].' It specifies training and testing sets but does not explicitly mention a validation set or its split details.
Hardware Specification Yes We adopt Adam optimizer to optimize all networks with a mini-batch size of 8 and a learning rate of 2.0 × 10−5 on 4 NVIDIA V100 GPUs.
Software Dependencies No The paper mentions software components like 'Adam optimizer', 'SD KL-regularized auto-encoder', 'DPM solver', and 'Free U', but it does not provide specific version numbers for any of these to ensure reproducibility of the software environment.
Experiment Setup Yes We adopt Adam optimizer to optimize all networks with a mini-batch size of 8 and a learning rate of 2.0 × 10−5 on 4 NVIDIA V100 GPUs. In addition, we employ the encoder and decoder of SD KL-regularized auto-encoder, with a down-sampling factor of d = 8 and a latent channel number of c = 4, as our encoder E and decoder D, respectively. We set T = 1, 000 for latent diffusion training as suggested by SD [Lee et al., 2022], and use the DPM solver [Lu et al., 2022] with 50 sampling steps for inference.