Boundary Guided Learning-Free Semantic Control with Diffusion Models

Authors: Ye Zhu, Yu Wu, Zhiwei Deng, Olga Russakovsky, Yan Yan

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on multiple DPMs architectures (DDPM, i DDPM) and datasets (Celeb A, Celeb A-HQ, LSUN-church, LSUN-bedroom, AFHQ-dog) with different resolutions (64, 256), achieving superior or state-of-the-art performance in various task scenarios (image semantic editing, text-based editing, unconditional semantic control) to demonstrate the effectiveness.
Researcher Affiliation Collaboration 1Department of Computer Science, Illinois Institute of Technology 2Department of Computer Science, Princeton University 3School of Computer Science, Wuhan University 4Google Research
Pseudocode Yes Algorithm 1 Boundary Diffusion (Conditional)
Open Source Code Yes Code is available at https://github.com/L-Ye Zhu/Boundary Diffusion.
Open Datasets Yes Specifically, we test the DDPMs on the Celab A-64 [39], Celeb A-HQ-256 [27], LSUN-Church-256 and LSUN-Bedroom-256 [66]. We also experiment with pretrained improved DDPM [41] on the AFHQ-Dog-265 [10].
Dataset Splits No The paper mentions using specific datasets like Celeb A-HQ and refers to a 'test set', but it does not specify train/validation/test dataset splits (e.g., percentages or sample counts) explicitly within the paper. It relies on established datasets without detailing their specific splitting methodology.
Hardware Specification Yes In practice, the hyperplanes are found via linear SVMs [18], with almost negligible learning time of about 1 second on a single RTX3090 GPU. and Specifically, by using the skipping step techniques, we can already generate high-quality denoised images using approximately 40-100 steps, which take from 1.682 13.272 seconds, respectively on a single RTX-3090 GPU.
Software Dependencies No The paper mentions 'sklearn python package' but does not specify its version number or versions for other key software components.
Experiment Setup Yes Specifically, our approach first locates the semantic boundaries in the form of hyperplanes via SVMs within the latent space at the critical mixing step. We then introduce a mixing trajectory with controllable editing strength, which guides the original latent encoding to cross the semantic boundary at the same diffusion step to achieve manipulation given a target editing attribute as in Fig. 3. ...We use the linear SVM classifier for searching the semantic boundary. ... For ϵ-space, the dimensionality dϵ = 3 256 256 = 196, 608. For the h-space, ... dh = 8 8 512 = 32, 768. In practice, we observe approximately 100 images are sufficient for finding an effective semantic boundary. For the text-based semantic editing scenario, we use synthetic images generated from Stable Diffusion using text-prompt [48].