Scene Graph Disentanglement and Composition for Generalizable Complex Image Generation
Authors: Yunnan Wang, Ziqiang Li, Wenyao Zhang, Zequn Zhang, Baao Xie, Xihui Liu, Wenjun Zeng, Xin Jin
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our method outperforms recent competitors based on text, layout, or scene graph, in terms of generation rationality and controllability. |
| Researcher Affiliation | Academia | 1Shanghai Jiao Tong University, Shanghai, China 2 Ningbo Institute of Digital Twin, Eastern Institute of Technology, Ningbo, China 3University of Science and Technology of China, Hefei, China 4The University of Hong Kong, Hong Kong, China |
| Pseudocode | No | The paper describes its method in detail using text and mathematical equations, but does not include a formally labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | Code is available at https://github.com/wangyunnan/DisCo. |
| Open Datasets | Yes | We conduct scene-graph-to-image (SG2I) generation experiments on the Visual Genome (VG) [27] and COCO-Stuff (COCO) [26] datasets. |
| Dataset Splits | No | The VG dataset comprises 108, 077 image-scene graph pairs... Based on the above filtering, we have 62, 565 images available for training, each containing an average of 10 objects and 5 relationships. The paper does not explicitly state validation or test splits as percentages or specific counts. |
| Hardware Specification | Yes | We fine-tune the pre-trained Stable-Diffusion 1.51 with the modified Attention module on 4 NVIDIA A100 GPUs, each with 80GB of memory. |
| Software Dependencies | Yes | We fine-tune the pre-trained Stable-Diffusion 1.51 with the modified Attention module... We apply the CLIP text encoder (vit-large-patch14 )... We train the model with a batch size of 64 using the Adam W optimizer [30]... During inference, we use the 50-step PNDMScheduler [21] with a classifiers-free scale [31] of 7.5. |
| Experiment Setup | Yes | We train the model with a batch size of 64 using the Adam W optimizer [30] with an initial learning rate of 1.0 10 4, which is adjusted linearly over 50, 000 steps. During inference, we use the 50-step PNDMScheduler [21] with a classifiers-free scale [31] of 7.5. The sample number Nl in the multi-layered sampler is set to 5. |