Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis
Authors: Taihang Hu, Linxuan Li, Joost van de Weijer, Hongcheng Gao, Fahad Shahbaz Khan, Jian Yang, Ming-Ming Cheng, KAI WANG, Yaxing Wang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted extensive experiments to validate the effectiveness of To Me, comparing it against various existing methods on the T2I-Comp Bench and our proposed GPT-4o object binding benchmark. |
| Researcher Affiliation | Academia | 1VCIP, College of Computer Science, Nankai University, 2NKIARI, Shenzhen Futian 3Computer Vision Center, Universitat Autònoma de Barcelona 4University of Chinese Academy of Sciences 5Mohamed bin Zayed University of AI, 6Linkoping University |
| Pseudocode | No | The paper describes the method and its components in text and with diagrams (e.g., Figure 4) but does not include structured pseudocode or an algorithm block. |
| Open Source Code | Yes | The code will be publicly available at https://github.com/hutaihang/To Me. |
| Open Datasets | Yes | Our final method To Me is quantitatively assessed using the widely adopted T2I-Comp Bench [31] and our proposed GPT-4o [1] object binding benchmark. |
| Dataset Splits | Yes | We follow the evaluation protocol [21, 30, 34] that using 300 validation prompts for evaluation under each subset |
| Hardware Specification | Yes | All experiments were conducted on an NVIDIA-A40 GPU. |
| Software Dependencies | No | The paper mentions software like SDXL, Spa Cy, CLIP, BLIP-VQA, Image Reward, and GPT-4o, but does not provide specific version numbers for these components. |
| Experiment Setup | Yes | The iterative composite token update is performed during the first 20% of the denoising steps Topt = 0.2T. ...the overall L = Lent + λ Lsem is computed by these two novel losses to update the composite token during each time t < Topt and λ is the trade-off hyperparameter. |