AID: Attention Interpolation of Text-to-Image Diffusion

Authors: He Qiyuan, Jinghao Wang, Ziwei Liu, Angela Yao

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments demonstrate that our method achieves greater consistency, smoothness, and efficiency in condition-based interpolation, aligning closely with human preferences.
Researcher Affiliation Academia Qiyuan He1 Jinghao Wang2 Ziwei Liu2 Angela Yao1, 1National University of Singapore 2S-Lab, Nanyang Technological University
Pseudocode Yes Algorithm 1 Exploration with Beta prior and Algorithm 2 Search smoothest sequence are presented in Appendix D.
Open Source Code Yes Our code and demo are available at https://qyh00.github.io/attention-interpolation-diffusion/.
Open Datasets Yes Our proposed framework is evaluated using corpora from CIFAR-10 [22] and the LAION-Aesthetics dataset from the larger LAION-5B collection [39].
Dataset Splits No The paper describes sampling methods for trials and iterations but does not explicitly provide training, validation, or test dataset splits for model evaluation.
Hardware Specification Yes All quantitative and qualitative experiments presented in this work are conducted on a single H100 GPU and Float16 precision.
Software Dependencies Yes We use Stable Diffusion 1.4 [35] as the base model to implement our attention interpolation mechanism for quantitative evaluation.
Experiment Setup Yes In all experiments, a 512 × 512 image is generated with the DDIM Scheduler [42] and DPM Scheduler [26] within 25 timesteps. In terms of Bayesian optimization on α and β in the beta prior to applying our selection approach, we set the smoothness of the interpolation sequence as the objective target, [1, 15] as the range of both hyperparameters, 9 fixed exploration where α and β are chosen from {10, 12, 14}, and 15 iterations to optimize.