Boundary Guided Learning-Free Semantic Control with Diffusion Models
Authors: Ye Zhu, Yu Wu, Zhiwei Deng, Olga Russakovsky, Yan Yan
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on multiple DPMs architectures (DDPM, i DDPM) and datasets (Celeb A, Celeb A-HQ, LSUN-church, LSUN-bedroom, AFHQ-dog) with different resolutions (64, 256), achieving superior or state-of-the-art performance in various task scenarios (image semantic editing, text-based editing, unconditional semantic control) to demonstrate the effectiveness. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science, Illinois Institute of Technology 2Department of Computer Science, Princeton University 3School of Computer Science, Wuhan University 4Google Research |
| Pseudocode | Yes | Algorithm 1 Boundary Diffusion (Conditional) |
| Open Source Code | Yes | Code is available at https://github.com/L-Ye Zhu/Boundary Diffusion. |
| Open Datasets | Yes | Specifically, we test the DDPMs on the Celab A-64 [39], Celeb A-HQ-256 [27], LSUN-Church-256 and LSUN-Bedroom-256 [66]. We also experiment with pretrained improved DDPM [41] on the AFHQ-Dog-265 [10]. |
| Dataset Splits | No | The paper mentions using specific datasets like Celeb A-HQ and refers to a 'test set', but it does not specify train/validation/test dataset splits (e.g., percentages or sample counts) explicitly within the paper. It relies on established datasets without detailing their specific splitting methodology. |
| Hardware Specification | Yes | In practice, the hyperplanes are found via linear SVMs [18], with almost negligible learning time of about 1 second on a single RTX3090 GPU. and Specifically, by using the skipping step techniques, we can already generate high-quality denoised images using approximately 40-100 steps, which take from 1.682 13.272 seconds, respectively on a single RTX-3090 GPU. |
| Software Dependencies | No | The paper mentions 'sklearn python package' but does not specify its version number or versions for other key software components. |
| Experiment Setup | Yes | Specifically, our approach first locates the semantic boundaries in the form of hyperplanes via SVMs within the latent space at the critical mixing step. We then introduce a mixing trajectory with controllable editing strength, which guides the original latent encoding to cross the semantic boundary at the same diffusion step to achieve manipulation given a target editing attribute as in Fig. 3. ...We use the linear SVM classifier for searching the semantic boundary. ... For ϵ-space, the dimensionality dϵ = 3 256 256 = 196, 608. For the h-space, ... dh = 8 8 512 = 32, 768. In practice, we observe approximately 100 images are sufficient for finding an effective semantic boundary. For the text-based semantic editing scenario, we use synthetic images generated from Stable Diffusion using text-prompt [48]. |