Localize, Understand, Collaborate: Semantic-Aware Dragging via Intention Reasoner

Authors: Xing Cui, Peipei Li, Zekun Li, Xuannan Liu, Yueying Zou, Zhaofeng He

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Both qualitative and quantitative comparisons demonstrate the superiority of Lucid Drag over previous methods.
Researcher Affiliation Academia 1Beijing University of Posts and Telecommunications 2University of California, Santa Barbara
Pseudocode Yes A.2 Algorithm Pipeline of Lucid Drag To facilitate the understanding of our Lucid Drag, we present the entire algorithm pipeline in Algorithm 1. Algorithm 1: Proposed Lucid Drag
Open Source Code Yes Code is available at: https://github.com/cuixing100876/Lucid Drag-Neur IPS2024.
Open Datasets Yes Following Drag Diffusion [53], we utilize the Drag Bench benchmark which is designed for the image-dragging task.
Dataset Splits No No explicit mention of validation dataset splits or usage was found, other than general training and testing.
Hardware Specification Yes The training of the discriminator can be conducted on a NVIDIA V100 GPU and the inference can be conducted on a NVIDIA Ge Force RTX 3090 GPU.
Software Dependencies No No specific software versions (e.g., PyTorch 1.9, Python 3.8) were provided for the dependencies, only general names like "Adam optimizer" and "Stable Diffusion".
Experiment Setup Yes To train the quality discriminator, we employ the Adam optimizer with a learning rate of 1e-4. We set the training epochs as 100 and the batch size as 128. For the denoising process, we adopt Stable Diffusion [51] as the base model. During sampling, the number of denoising steps is set to T = 50 with a classifier-free guidance of 5. The energy weights for gquality, gdrag and gcontent are set to 1e 3, 4e 4 and 4e 4, respectively.