R3CD: Scene Graph to Image Generation with Relation-Aware Compositional Contrastive Control Diffusion

Authors: Jinxiu Liu, Qi Liu

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments are conducted on two datasets: Visual Genome and COCO-Stuff, and demonstrate that the proposal outperforms existing models both in quantitative and qualitative metrics to generate more realistic and diverse images according to different scene graph specifications.
Researcher Affiliation Academia Jinxiu Liu, Qi Liu* School of Future Technology, South China University of Technology jinxiuliu0628@foxmail.com, drliuqi@scut.edu.cn
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement or link regarding the availability of open-source code for the described methodology.
Open Datasets Yes We have evaluated the proposal on Visual Genome (Krishna et al. 2017) and COCO-Stuff (Caesar, Uijlings, and Ferrari 2018) datasets, where R3CD is superior to other competitors both in quantitative metrics (IS (Heusel et al. 2017), e.g., FID (Salimans et al. 2016)), and qualitative visualization results.
Dataset Splits No The paper mentions using Visual Genome and COCO-Stuff datasets and adopting 'their evaluation settings' but does not specify the exact percentages or counts for training, validation, or test splits. It does not provide sufficient details to reproduce the data partitioning.
Hardware Specification Yes We use Adam optimizer (Kingma and Ba 2014) to train diffusion models with a learning rate of 5e-5, a batch size of 16, and 700,000 iterations on RTX 3090.
Software Dependencies No The paper mentions using a 'T5 model' and 'UNet' as components but does not provide specific version numbers for these or any other software libraries, frameworks, or programming languages that would be needed for reproducibility.
Experiment Setup Yes We use Adam optimizer (Kingma and Ba 2014) to train diffusion models with a learning rate of 5e-5, a batch size of 16, and 700,000 iterations on RTX 3090. For the contrastive loss module, we choose the trade-off parameters as 0.01, and as 0.02.