Multimodal Summarization with Guidance of Multimodal Reference

Authors: Junnan Zhu, Yu Zhou, Jiajun Zhang, Haoran Li, Chengqing Zong, Changliang Li9749-9756

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results have shown that our proposed model achieves the new state-of-the-art on both automatic and manual evaluation metrics. Besides, our proposed evaluation method can effectively improve the correlation with human judgments.
Researcher Affiliation Collaboration 1National Laboratory of Pattern Recognition, Institute of Automation, CAS 2University of Chinese Academy of Sciences 3CAS Center for Excellence in Brain Science and Intelligence Technology 4JD AI Research 5Kingsoft AI Lab
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets Yes We use the MSMO dataset (Zhu et al. 2018) which contains online news articles (723 tokens on average) paired with multiple image-caption pairs (6.58 images on average) and multi-sentence summaries (70 tokens on average).
Dataset Splits Yes The dataset includes 293,965 training pairs, 10,355 validation pairs, and 10,261 test pairs.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper mentions models and components such as 'VGG19 pretrained on Image Net' but does not specify version numbers for any ancillary software dependencies or libraries used for implementation (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes We set λ to 1.0 and the image number K (the target when calculating the cross-entropy loss) to 3 here. Discussion on λ (See Table 4). To study the impact of λ, we conduct an experiment on how the model performance changes when λ varies from 0.5 to 2.0. Discussion on K (See Table 5). Table 5 depicts the experimental results of the model performance varying with K (the image number at target).