reproducibilityindex.ai

SSMG: Spatial-Semantic Map Guided Diffusion Model for Free-Form Layout-to-Image Generation

Authors: Chengyou Jia, Minnan Luo, Zhuohang Dang, Guang Dai, Xiaojun Chang, Mengmeng Wang, Jingdong Wang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments conducted on benchmark datasets demonstrate that our SSMG achieves highly promising results, setting a new state-of-the-art across a range of metrics encompassing fidelity, diversity, and controllability.
Researcher Affiliation	Collaboration	Chengyou Jia1*, Minnan Luo1 , Zhuohang Dang1, Guang Dai2,3, Xiaojun Chang4,5, Mengmeng Wang6,2 , Jingdong Wang7 1School of Computer Science and Technology, MOEKLINNS Lab, Xi an Jiaotong University 2SGIT AI Lab 3State Grid Corporation of China 4University of Technology Sydney 5Mohamed bin Zayed University of Artificial Intelligence 6Zhejiang University 7Baidu Inc
Pseudocode	No	The paper does not contain structured pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper does not provide any specific links or explicit statements regarding the availability of its source code.
Open Datasets	Yes	Datasets. We adopt widely recognized benchmarks COCO-Thing-Stuff (Lin et al. 2014; Caesar, Uijlings, and Ferrari 2018) for both training and evaluation.
Dataset Splits	Yes	It consists of 118, 287 training and 5, 000 validation images, which are annotated with 80 thing/object classes and 182 semantic stuff classes.
Hardware Specification	Yes	The model is trained on 4 NVIDIA-A100 GPUs with a batch size of 64, requiring 2 days for 50 epochs.
Software Dependencies	No	The paper mentions 'Py Torch Lightning framework' and 'Stable Diffusion v15 and v21' but does not provide specific version numbers for general software dependencies like PyTorch Lightning itself, nor other libraries used.
Experiment Setup	Yes	During training, we take the Adam W as the optimizer within the Py Torch Lightning framework. We resize the input images to 512 512. The model is trained on 4 NVIDIA-A100 GPUs with a batch size of 64, requiring 2 days for 50 epochs. During inference, we use 20 DDIM (Song, Meng, and Ermon 2020) sampling steps with classifier-free guidance (Ho and Salimans 2022) scale of 9.