Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

PhysDiff-VTON: Cross-Domain Physics Modeling and Trajectory Optimization for Virtual Try-On

Authors: Shibin Mei, Bingbing Ni

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive evaluations across multiple datasets demonstrate significant improvements in both geometric plausibility and perceptual quality compared to existing approaches. The framework establishes a new paradigm for synthesizing photorealistic try-on images that adhere to physical constraints while maintaining intricate garment details, advancing the practical applicability of diffusion models in fashion technology.
Researcher Affiliation Collaboration Shibin Mei Huawei EMAIL Bingbing Ni Shanghai Jiao Tong University EMAIL
Pseudocode Yes Algorithm 1 Potential-Regularized Diffusion Sampling (PRPO)
Open Source Code No Justification: We provide detailed implement instructions to reproduce our experimental results. We use open-source datasets and our code will be released soon.
Open Datasets Yes We conduct comprehensive evaluations on VITON-HD [3] and Dress Code [24]. We train our model on the VITON-HD dataset, which contains 11,647 person-garment image pairs.
Dataset Splits No The paper mentions training on VITON-HD (total size 11,647) and evaluating on VITON-HD and Dress Code test sets, and conducting hyperparameter studies on the VITON-HD validation set. However, it does not explicitly provide the specific percentages or counts for training/validation/test splits used for their model.
Hardware Specification Yes requiring approximately 95 hours on 4 H800 GPUs.
Software Dependencies No The paper mentions using the Adam optimizer, PyTorch (implied by GPU usage), and references SDXL inpainting model [1] and UNet of SDXL [28]. However, it does not specify version numbers for any of these software components.
Experiment Setup Yes We employ the Adam optimizer with a fixed learning rate of 1 10 5 for 130 training epochs, requiring approximately 95 hours on 4 H800 GPUs. Our data augmentation strategy aligns with Stable-VITON [18], featuring a 0.5 probability of horizontal flipping and 0.5 probability of random affine transformations. During inference, we utilize the PRPO sampler with 30 denoising steps and maximum strength (η = 1.0), initiating from random noise while disregarding masked regions in the input person image. For classifier-free guidance, inspired by IDM-VITON [4] and Spa Text [2], we jointly condition the model using low-level garment features and high-level semantic features from IP-Adapter [43]. The guidance scale w is set to 2.0. All experiments use the same training protocol with a batch size 32 and 200K iterations.