Image Content Generation with Causal Reasoning

Authors: Xiaochuan Li, Baoyu Fan, Runze Zhang, Liang Jin, Di Wang, Zhenhua Guo, Yaqian Zhao, Rengang Li

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we perform extensive experiments and analyses, including visualizations of the generated content and discussions on the potentials and limitations.
Researcher Affiliation Collaboration 1Inspur Electronic Information Industry Co.,Ltd. 2Nankai University 3Tsinghua University 4Shandong Massive Information Technology Research Institute
Pseudocode No No pseudocode or clearly labeled algorithm blocks were found in the paper.
Open Source Code Yes The code and data are publicly available under the license of CC BY-NC-SA 4.0 for academic and non-commercial usage at: https://github.com/IEIT-AGI/ MIX-Shannon/blob/main/projects/VQAI/lgd_vqai.md.
Open Datasets Yes Hence, we propose a new image generation task called visual question answering with image (VQAI) and establish a dataset of the same name based on the classic Tom and Jerry animated series. [...] The code and data are publicly available under the license of CC BY-NC-SA 4.0 for academic and non-commercial usage at: https://github.com/IEIT-AGI/ MIX-Shannon/blob/main/projects/VQAI/lgd_vqai.md.
Dataset Splits Yes In the dataset, we divided 17,524 samples into 15,524, 1,000 and 1,000, corresponding to the training, validation and testing sets.
Hardware Specification Yes All experiments are run on an A100 8 server.
Software Dependencies No The paper mentions models used ('T5-XXL', 'stable diffusion', 'Flan-T5-XXL') but does not provide specific version numbers for any ancillary software libraries or frameworks like Python, PyTorch, or CUDA.
Experiment Setup Yes All initial learning rates are set to 3e-5. In the comparison experiments, we use ADAM (Kingma and Ba 2014) as the optimizer. We set the batch size to 16 and the epoch to 20.