reproducibilityindex.ai

Latent Space Editing in Transformer-Based Flow Matching

Authors: Vincent Tao Hu, Wei Zhang, Meng Tang, Pascal Mettes, Deli Zhao, Cees Snoek

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	For the experiments on semantic manipulation in the u-space, we mainly use the 256 256 Celeb A-HQ (Xia et al. 2021) dataset. In Figure 2, we investigate the optimal time interval to inject the guidance signal from the semantic direction... We observe that injecting the signal for too few steps... fails to perform the intended edits. Our method consistently outperformed prompt-to-prompt.
Researcher Affiliation	Collaboration	Vincent Tao Hu1, Wei Zhang1, Meng Tang2, Pascal Mettes1, Deli Zhao3, Cees Snoek1 1 University of Amsterdam 2 University of California Merced 3 Alibaba Group
Pseudocode	Yes	In Algorithm ?? of Appendix, we provide the overall pipeline for semantic direction manipulation in u-space with adaptive step-size ODE solvers.
Open Source Code	Yes	Our code will be publicly available at https://taohu.me/lfm/
Open Datasets	Yes	For the experiments on semantic manipulation in the u-space, we mainly use the 256 256 Celeb A-HQ (Xia et al. 2021) dataset. For prompt-based editing, we conduct the experiments on the Multi Modal-Celeb A-HQ (Xia et al. 2021) and MS COCO (Lin et al. 2014) datasets, with image resolution 256 256.
Dataset Splits	Yes	Both datasets are composed of text-image pairs for training. For editing real images, we choose the images from the validation set of MS COCO.
Hardware Specification	No	The paper mentions running experiments but does not provide specific hardware details such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., 'PyTorch 1.9', 'CUDA 11.1').
Experiment Setup	Yes	For tedit, we found tedit=0.5 works reasonably well. For the guidance strength w in Equation (7), we observe that w ( 2, 2) generally provides sufficient flexibil-ity while still producing reasonable results. If not mentioned otherwise, we use the adaptive ODE solver dopri5.