Latent Space Editing in Transformer-Based Flow Matching
Authors: Vincent Tao Hu, Wei Zhang, Meng Tang, Pascal Mettes, Deli Zhao, Cees Snoek
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | For the experiments on semantic manipulation in the u-space, we mainly use the 256 256 Celeb A-HQ (Xia et al. 2021) dataset. In Figure 2, we investigate the optimal time interval to inject the guidance signal from the semantic direction... We observe that injecting the signal for too few steps... fails to perform the intended edits. Our method consistently outperformed prompt-to-prompt. |
| Researcher Affiliation | Collaboration | Vincent Tao Hu1, Wei Zhang1, Meng Tang2, Pascal Mettes1, Deli Zhao3, Cees Snoek1 1 University of Amsterdam 2 University of California Merced 3 Alibaba Group |
| Pseudocode | Yes | In Algorithm ?? of Appendix, we provide the overall pipeline for semantic direction manipulation in u-space with adaptive step-size ODE solvers. |
| Open Source Code | Yes | Our code will be publicly available at https://taohu.me/lfm/ |
| Open Datasets | Yes | For the experiments on semantic manipulation in the u-space, we mainly use the 256 256 Celeb A-HQ (Xia et al. 2021) dataset. For prompt-based editing, we conduct the experiments on the Multi Modal-Celeb A-HQ (Xia et al. 2021) and MS COCO (Lin et al. 2014) datasets, with image resolution 256 256. |
| Dataset Splits | Yes | Both datasets are composed of text-image pairs for training. For editing real images, we choose the images from the validation set of MS COCO. |
| Hardware Specification | No | The paper mentions running experiments but does not provide specific hardware details such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., 'PyTorch 1.9', 'CUDA 11.1'). |
| Experiment Setup | Yes | For tedit, we found tedit=0.5 works reasonably well. For the guidance strength w in Equation (7), we observe that w ( 2, 2) generally provides sufficient flexibil-ity while still producing reasonable results. If not mentioned otherwise, we use the adaptive ODE solver dopri5. |