Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

GrounDiT: Grounding Diffusion Transformers via Noisy Patch Transplantation

Authors: Yuseung Lee, Taehoon Yoon, Minhyuk Sung

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments on the HRS and Draw Bench benchmarks, we achieve state-of-the-art performance compared to previous training-free approaches.
Researcher Affiliation Academia Phillip Y. Lee Taehoon Yoon Minhyuk Sung KAIST EMAIL
Pseudocode Yes Algorithm 1: Pseudocode of Joint Denoising (Sec. 5.2).
Open Source Code No Justification: While we have not provided the code yet, we provide a link to our project page, at which we will provide the official code for our method.
Open Datasets Yes In our experiments on the HRS [3] and Draw Bench [43] datasets, we evaluate our framework, GROUNDIT, using Pix Art-α [8] as the base text-to-image Di T model.
Dataset Splits No The HRS dataset consists of 1002, 501, and 501 images for each respective criterion, with bounding boxes generated using GPT-4 by Phung et al. [38]. No explicit train/validation/test splits are provided for their experimental setup.
Hardware Specification Yes All our experiments were conducted an NVIDIA RTX 3090 GPU.
Software Dependencies No The paper mentions using Pix Art-α [8] as the base text-to-image Di T model and the DPM-Solver scheduler [35], but does not provide specific software dependency versions (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes For sampling we employed the DPM-Solver scheduler [35] with 50 steps. Out of the 50 denoising steps, we applied our GROUNDIT denoising step (Alg. 2) for the initial 25 steps, and applied the vanilla denoising step for the remaining 25 steps. For the grounding loss in Global Update of GROUNDIT, we adopted the definition proposed in R&B [47], and we set the loss scale to 10 and used a gradient descent weight of 5 for the gradient descent update in Eq. 7.