Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

IterMeme: Expert-Guided Multimodal LLM for Interactive Meme Creation with Layout-Aware Generation

Authors: Yaqi Cai, Shancheng Fang, Yadong Qu, Xiaorui Wang, Meng Shao, Hongtao Xie

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that Iter Meme significantly advances the field of meme creation by delivering consistently highquality outcomes. The code, model, and dataset will be open-sourced to the community.
Researcher Affiliation Collaboration Yaqi Cai1 , Shancheng Fang2 , Yadong Qu1 , Xiaorui Wang2 , Meng Shao1 , Hongtao Xie1 1University of Science and Technology of China 2Yuan Shi Technology
Pseudocode No The paper describes methods and processes in narrative text and with mathematical formulations, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes The code, model, and dataset will be open-sourced to the community.
Open Datasets Yes In the first stage, we utilize the Meme Cap dataset introduced in section 3.3, along with a subset of the I2T category from the Oogiri-GO dataset introduced in [Zhong et al., 2024].
Dataset Splits No The paper mentions the Meme Cap dataset including 13,977 English samples and 22,457 Chinese samples, and an evaluation dataset of 60 IPs. However, it does not specify explicit training, validation, or test splits (e.g., percentages or counts for each split) for these datasets in the main text.
Hardware Specification No We acknowledge the support of GPU cluster built by MCC Lab of Information Science and Technology Institution, USTC. We also thank the USTC supercomputing center for providing computational resources for this project.
Software Dependencies Yes The unified MLLM for understanding and generation is built upon Dream LLM [Dong et al., 2023] as the base model, with SD2.1 [Rombach et al., 2022] serving as the visual decoder.
Experiment Setup Yes In the first training stage, the Adam W [Loshchilov, 2017] optimizer is employed with a learning rate of 1 10 5 and a weight decay parameter of 0. The batch size is set to 16, and gradient accumulation is performed over 2 steps to optimize computational efficiency.