SSMG: Spatial-Semantic Map Guided Diffusion Model for Free-Form Layout-to-Image Generation
Authors: Chengyou Jia, Minnan Luo, Zhuohang Dang, Guang Dai, Xiaojun Chang, Mengmeng Wang, Jingdong Wang
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments conducted on benchmark datasets demonstrate that our SSMG achieves highly promising results, setting a new state-of-the-art across a range of metrics encompassing fidelity, diversity, and controllability. |
| Researcher Affiliation | Collaboration | Chengyou Jia1*, Minnan Luo1 , Zhuohang Dang1, Guang Dai2,3, Xiaojun Chang4,5, Mengmeng Wang6,2 , Jingdong Wang7 1School of Computer Science and Technology, MOEKLINNS Lab, Xi an Jiaotong University 2SGIT AI Lab 3State Grid Corporation of China 4University of Technology Sydney 5Mohamed bin Zayed University of Artificial Intelligence 6Zhejiang University 7Baidu Inc |
| Pseudocode | No | The paper does not contain structured pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not provide any specific links or explicit statements regarding the availability of its source code. |
| Open Datasets | Yes | Datasets. We adopt widely recognized benchmarks COCO-Thing-Stuff (Lin et al. 2014; Caesar, Uijlings, and Ferrari 2018) for both training and evaluation. |
| Dataset Splits | Yes | It consists of 118, 287 training and 5, 000 validation images, which are annotated with 80 thing/object classes and 182 semantic stuff classes. |
| Hardware Specification | Yes | The model is trained on 4 NVIDIA-A100 GPUs with a batch size of 64, requiring 2 days for 50 epochs. |
| Software Dependencies | No | The paper mentions 'Py Torch Lightning framework' and 'Stable Diffusion v15 and v21' but does not provide specific version numbers for general software dependencies like PyTorch Lightning itself, nor other libraries used. |
| Experiment Setup | Yes | During training, we take the Adam W as the optimizer within the Py Torch Lightning framework. We resize the input images to 512 512. The model is trained on 4 NVIDIA-A100 GPUs with a batch size of 64, requiring 2 days for 50 epochs. During inference, we use 20 DDIM (Song, Meng, and Ermon 2020) sampling steps with classifier-free guidance (Ho and Salimans 2022) scale of 9. |