Localize, Understand, Collaborate: Semantic-Aware Dragging via Intention Reasoner
Authors: Xing Cui, Peipei Li, Zekun Li, Xuannan Liu, Yueying Zou, Zhaofeng He
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Both qualitative and quantitative comparisons demonstrate the superiority of Lucid Drag over previous methods. |
| Researcher Affiliation | Academia | 1Beijing University of Posts and Telecommunications 2University of California, Santa Barbara |
| Pseudocode | Yes | A.2 Algorithm Pipeline of Lucid Drag To facilitate the understanding of our Lucid Drag, we present the entire algorithm pipeline in Algorithm 1. Algorithm 1: Proposed Lucid Drag |
| Open Source Code | Yes | Code is available at: https://github.com/cuixing100876/Lucid Drag-Neur IPS2024. |
| Open Datasets | Yes | Following Drag Diffusion [53], we utilize the Drag Bench benchmark which is designed for the image-dragging task. |
| Dataset Splits | No | No explicit mention of validation dataset splits or usage was found, other than general training and testing. |
| Hardware Specification | Yes | The training of the discriminator can be conducted on a NVIDIA V100 GPU and the inference can be conducted on a NVIDIA Ge Force RTX 3090 GPU. |
| Software Dependencies | No | No specific software versions (e.g., PyTorch 1.9, Python 3.8) were provided for the dependencies, only general names like "Adam optimizer" and "Stable Diffusion". |
| Experiment Setup | Yes | To train the quality discriminator, we employ the Adam optimizer with a learning rate of 1e-4. We set the training epochs as 100 and the batch size as 128. For the denoising process, we adopt Stable Diffusion [51] as the base model. During sampling, the number of denoising steps is set to T = 50 with a classifier-free guidance of 5. The energy weights for gquality, gdrag and gcontent are set to 1e 3, 4e 4 and 4e 4, respectively. |