GrounDiT: Grounding Diffusion Transformers via Noisy Patch Transplantation
Authors: Yuseung Lee, Taehoon Yoon, Minhyuk Sung
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments on the HRS and Draw Bench benchmarks, we achieve state-of-the-art performance compared to previous training-free approaches. |
| Researcher Affiliation | Academia | Phillip Y. Lee Taehoon Yoon Minhyuk Sung KAIST {phillip0701,taehoon,mhsung}@kaist.ac.kr |
| Pseudocode | Yes | Algorithm 1: Pseudocode of Joint Denoising (Sec. 5.2). |
| Open Source Code | No | Justification: While we have not provided the code yet, we provide a link to our project page, at which we will provide the official code for our method. |
| Open Datasets | Yes | In our experiments on the HRS [3] and Draw Bench [43] datasets, we evaluate our framework, GROUNDIT, using Pix Art-α [8] as the base text-to-image Di T model. |
| Dataset Splits | No | The HRS dataset consists of 1002, 501, and 501 images for each respective criterion, with bounding boxes generated using GPT-4 by Phung et al. [38]. No explicit train/validation/test splits are provided for their experimental setup. |
| Hardware Specification | Yes | All our experiments were conducted an NVIDIA RTX 3090 GPU. |
| Software Dependencies | No | The paper mentions using Pix Art-α [8] as the base text-to-image Di T model and the DPM-Solver scheduler [35], but does not provide specific software dependency versions (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | For sampling we employed the DPM-Solver scheduler [35] with 50 steps. Out of the 50 denoising steps, we applied our GROUNDIT denoising step (Alg. 2) for the initial 25 steps, and applied the vanilla denoising step for the remaining 25 steps. For the grounding loss in Global Update of GROUNDIT, we adopted the definition proposed in R&B [47], and we set the loss scale to 10 and used a gradient descent weight of 5 for the gradient descent update in Eq. 7. |