Latent Space Editing in Transformer-Based Flow Matching

Authors: Vincent Tao Hu, Wei Zhang, Meng Tang, Pascal Mettes, Deli Zhao, Cees Snoek

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental For the experiments on semantic manipulation in the u-space, we mainly use the 256 256 Celeb A-HQ (Xia et al. 2021) dataset. In Figure 2, we investigate the optimal time interval to inject the guidance signal from the semantic direction... We observe that injecting the signal for too few steps... fails to perform the intended edits. Our method consistently outperformed prompt-to-prompt.
Researcher Affiliation Collaboration Vincent Tao Hu1, Wei Zhang1, Meng Tang2, Pascal Mettes1, Deli Zhao3, Cees Snoek1 1 University of Amsterdam 2 University of California Merced 3 Alibaba Group
Pseudocode Yes In Algorithm ?? of Appendix, we provide the overall pipeline for semantic direction manipulation in u-space with adaptive step-size ODE solvers.
Open Source Code Yes Our code will be publicly available at https://taohu.me/lfm/
Open Datasets Yes For the experiments on semantic manipulation in the u-space, we mainly use the 256 256 Celeb A-HQ (Xia et al. 2021) dataset. For prompt-based editing, we conduct the experiments on the Multi Modal-Celeb A-HQ (Xia et al. 2021) and MS COCO (Lin et al. 2014) datasets, with image resolution 256 256.
Dataset Splits Yes Both datasets are composed of text-image pairs for training. For editing real images, we choose the images from the validation set of MS COCO.
Hardware Specification No The paper mentions running experiments but does not provide specific hardware details such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., 'PyTorch 1.9', 'CUDA 11.1').
Experiment Setup Yes For tedit, we found tedit=0.5 works reasonably well. For the guidance strength w in Equation (7), we observe that w ( 2, 2) generally provides sufficient flexibil-ity while still producing reasonable results. If not mentioned otherwise, we use the adaptive ODE solver dopri5.