PHOTOSWAP: Personalized Subject Swapping in Images

Authors: Jing Gu, Yilin Wang, Nanxuan Zhao, Tsu-Jui Fu, Wei Xiong, Qing Liu, Zhifei Zhang, HE Zhang, Jianming Zhang, HyunJoon Jung, Xin Eric Wang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experiments underscore the efficacy and controllability of Photoswap in personalized subject swapping. Furthermore, Photoswap significantly outperforms baseline methods in human ratings across subject swapping, background preservation, and overall quality, revealing its vast application potential, from entertainment to professional editing.
Researcher Affiliation Collaboration Jing Gu1 Yilin Wang2 Nanxuan Zhao2 Tsu-Jui Fu3 Wei Xiong2 Qing Liu2 Zhifei Zhang2 He Zhang2 Jianming Zhang2 Hyun Joon Jung2 Xin Eric Wang1 1University of California, Santa Cruz 2Adobe 3University of California, Santa Barbara
Pseudocode Yes Algorithm 1 The Photoswap Algorithm
Open Source Code No The paper provides a project website 'https://photoswap.github.io/' but does not explicitly state that the source code for the methodology is available there or at a specific code repository link. The provided URL is a project overview page rather without a direct link to a code repository.
Open Datasets No For real images, the paper states: 'All prompts, along with the collected image, will be made available in our next revision.' For synthetic images, it says: 'All prompts used in synthetic image generation will also be released too.' This indicates future availability of data rather than current public access to the full datasets used.
Dataset Splits No The paper mentions generating images for evaluation and sampling 200 images from real and synthetic datasets for human evaluation, but it does not specify explicit train, validation, and test dataset splits with percentages or sample counts for the models used.
Hardware Specification Yes The Dream Booth training takes around 10 minutes on a machine with 8 A100 GPU cards.
Software Dependencies Yes For concept learning, we mainly utilize Dream Booth (Ruiz et al., 2023) to finetune a stable diffusion 2.1 to learn the new concept from 3 5 images.
Experiment Setup Yes During inference, we utilize the DDIM sampling method with 50 denoising steps and classifier-free guidance of 7.5. The default step λA for cross-attention map replacement is 20. The default step λM for self-attention map replacement is 25, while the default step for self-attention feature λϕ replacement is 10. ... The learning rate is set to 1e-6. We use Adawm optimizer with 800 hundred training steps.