Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Scene Graph Disentanglement and Composition for Generalizable Complex Image Generation
Authors: Yunnan Wang, Ziqiang Li, Wenyao Zhang, Zequn Zhang, Baao Xie, Xihui Liu, Wenjun Zeng, Xin Jin
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our method outperforms recent competitors based on text, layout, or scene graph, in terms of generation rationality and controllability. |
| Researcher Affiliation | Academia | 1Shanghai Jiao Tong University, Shanghai, China 2 Ningbo Institute of Digital Twin, Eastern Institute of Technology, Ningbo, China 3University of Science and Technology of China, Hefei, China 4The University of Hong Kong, Hong Kong, China |
| Pseudocode | No | The paper describes its method in detail using text and mathematical equations, but does not include a formally labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | Code is available at https://github.com/wangyunnan/DisCo. |
| Open Datasets | Yes | We conduct scene-graph-to-image (SG2I) generation experiments on the Visual Genome (VG) [27] and COCO-Stuff (COCO) [26] datasets. |
| Dataset Splits | No | The VG dataset comprises 108, 077 image-scene graph pairs... Based on the above filtering, we have 62, 565 images available for training, each containing an average of 10 objects and 5 relationships. The paper does not explicitly state validation or test splits as percentages or specific counts. |
| Hardware Specification | Yes | We fine-tune the pre-trained Stable-Diffusion 1.51 with the modified Attention module on 4 NVIDIA A100 GPUs, each with 80GB of memory. |
| Software Dependencies | Yes | We fine-tune the pre-trained Stable-Diffusion 1.51 with the modified Attention module... We apply the CLIP text encoder (vit-large-patch14 )... We train the model with a batch size of 64 using the Adam W optimizer [30]... During inference, we use the 50-step PNDMScheduler [21] with a classifiers-free scale [31] of 7.5. |
| Experiment Setup | Yes | We train the model with a batch size of 64 using the Adam W optimizer [30] with an initial learning rate of 1.0 10 4, which is adjusted linearly over 50, 000 steps. During inference, we use the 50-step PNDMScheduler [21] with a classifiers-free scale [31] of 7.5. The sample number Nl in the multi-layered sampler is set to 5. |