reproducibilityindex.ai

Scene Graph Disentanglement and Composition for Generalizable Complex Image Generation

Authors: Yunnan Wang, Ziqiang Li, Wenyao Zhang, Zequn Zhang, Baao Xie, Xihui Liu, Wenjun Zeng, Xin Jin

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that our method outperforms recent competitors based on text, layout, or scene graph, in terms of generation rationality and controllability.
Researcher Affiliation	Academia	1Shanghai Jiao Tong University, Shanghai, China 2 Ningbo Institute of Digital Twin, Eastern Institute of Technology, Ningbo, China 3University of Science and Technology of China, Hefei, China 4The University of Hong Kong, Hong Kong, China
Pseudocode	No	The paper describes its method in detail using text and mathematical equations, but does not include a formally labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code	Yes	Code is available at https://github.com/wangyunnan/DisCo.
Open Datasets	Yes	We conduct scene-graph-to-image (SG2I) generation experiments on the Visual Genome (VG) [27] and COCO-Stuff (COCO) [26] datasets.
Dataset Splits	No	The VG dataset comprises 108, 077 image-scene graph pairs... Based on the above filtering, we have 62, 565 images available for training, each containing an average of 10 objects and 5 relationships. The paper does not explicitly state validation or test splits as percentages or specific counts.
Hardware Specification	Yes	We fine-tune the pre-trained Stable-Diffusion 1.51 with the modified Attention module on 4 NVIDIA A100 GPUs, each with 80GB of memory.
Software Dependencies	Yes	We fine-tune the pre-trained Stable-Diffusion 1.51 with the modified Attention module... We apply the CLIP text encoder (vit-large-patch14 )... We train the model with a batch size of 64 using the Adam W optimizer [30]... During inference, we use the 50-step PNDMScheduler [21] with a classifiers-free scale [31] of 7.5.
Experiment Setup	Yes	We train the model with a batch size of 64 using the Adam W optimizer [30] with an initial learning rate of 1.0 10 4, which is adjusted linearly over 50, 000 steps. During inference, we use the 50-step PNDMScheduler [21] with a classifiers-free scale [31] of 7.5. The sample number Nl in the multi-layered sampler is set to 5.