reproducibilityindex.ai

GrounDiT: Grounding Diffusion Transformers via Noisy Patch Transplantation

Authors: Yuseung Lee, Taehoon Yoon, Minhyuk Sung

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our experiments on the HRS and Draw Bench benchmarks, we achieve state-of-the-art performance compared to previous training-free approaches.
Researcher Affiliation	Academia	Phillip Y. Lee Taehoon Yoon Minhyuk Sung KAIST {phillip0701,taehoon,mhsung}@kaist.ac.kr
Pseudocode	Yes	Algorithm 1: Pseudocode of Joint Denoising (Sec. 5.2).
Open Source Code	No	Justification: While we have not provided the code yet, we provide a link to our project page, at which we will provide the official code for our method.
Open Datasets	Yes	In our experiments on the HRS [3] and Draw Bench [43] datasets, we evaluate our framework, GROUNDIT, using Pix Art-α [8] as the base text-to-image Di T model.
Dataset Splits	No	The HRS dataset consists of 1002, 501, and 501 images for each respective criterion, with bounding boxes generated using GPT-4 by Phung et al. [38]. No explicit train/validation/test splits are provided for their experimental setup.
Hardware Specification	Yes	All our experiments were conducted an NVIDIA RTX 3090 GPU.
Software Dependencies	No	The paper mentions using Pix Art-α [8] as the base text-to-image Di T model and the DPM-Solver scheduler [35], but does not provide specific software dependency versions (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	For sampling we employed the DPM-Solver scheduler [35] with 50 steps. Out of the 50 denoising steps, we applied our GROUNDIT denoising step (Alg. 2) for the initial 25 steps, and applied the vanilla denoising step for the remaining 25 steps. For the grounding loss in Global Update of GROUNDIT, we adopted the definition proposed in R&B [47], and we set the loss scale to 10 and used a gradient descent weight of 5 for the gradient descent update in Eq. 7.