Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Imagine That! Abstract-to-Intricate Text-to-Image Synthesis with Scene Graph Hallucination Diffusion
Authors: Shengqiong Wu, Hao Fei, Hanwang Zhang, Tat-Seng Chua
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On the benchmark COCO dataset, our system outperforms the existing best-performing T2I model by a significant margin, especially improving on the abstract-to-intricate T2I generation. Further in-depth analyses reveal how our methods advance.2 |
| Researcher Affiliation | Academia | Shengqiong Wu 1 Hao Fei 1 Hanwang Zhang 2 Tat-Seng Chua 1 1NEx T++, School of Computing, National University of Singapore 2 School of Computer Science and Engineering, Nanyang Technological University EMAIL EMAIL EMAIL |
| Pseudocode | No | The paper describes the model architecture and processes using mathematical formulations and textual descriptions, but it does not include a block of pseudocode or a clearly labeled algorithm block. |
| Open Source Code | Yes | Code is available at https://github.com/Choco Wu/T2I-Salad |
| Open Datasets | Yes | We conduct T2I generation experiments mainly on the COCO [33] dataset. We also prepare the abstract-to-intricate SG pair annotations for training the SGH module, where we employ an external textual SG parser [46] and a visual SG parser [59] on the paired images and texts in COCO, to obtain the initial SG and imagined SG, respectively. To enlarge the abstract-to-intricate SG pairs, we further extend Visual Genome (VG) [30]. |
| Dataset Splits | Yes | The training and validation data numbers in COCO are 83K and 41k, respectively. We note that, in the evaluation phase, models are evaluated on the full COCO 2014 validation set. |
| Hardware Specification | No | The paper mentions loading parameters from Stable Diffusion (v1.4) and using CLIP (vit-large-patch14) but does not specify any hardware details such as GPU models, CPU types, or memory used for training or inference. |
| Software Dependencies | Yes | For the SIS module, we load the parameters of Stable Diffusion5 (v1.4) as the initialization. We use the CLIP6 (vit-large-patch14) as our text encoder. We optimize the framework using Adam W [34] with β1 = 0.9 and β2 = 0.98. |
| Experiment Setup | Yes | We define the maximum number of SG object nodes as 30, and each object node has a maximum of 3 attributes. We set the timesteps (T) for SGH and SIS as 100. We optimize the framework using Adam W [34] with β1 = 0.9 and β2 = 0.98. The learning rate is set to 5e-5 after 10,000 iterations of warmup. For the attention layer in SG decoder and UNet in SIS, we define a shared configuration as follows: 4 layers, 8 attention heads, 512 embedding dimensions, 2,048 hidden dimensions, and 0.1 dropout rate. |