reproducibilityindex.ai

Multimodal Summarization with Guidance of Multimodal Reference

Authors: Junnan Zhu, Yu Zhou, Jiajun Zhang, Haoran Li, Chengqing Zong, Changliang Li9749-9756

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results have shown that our proposed model achieves the new state-of-the-art on both automatic and manual evaluation metrics. Besides, our proposed evaluation method can effectively improve the correlation with human judgments.
Researcher Affiliation	Collaboration	1National Laboratory of Pattern Recognition, Institute of Automation, CAS 2University of Chinese Academy of Sciences 3CAS Center for Excellence in Brain Science and Intelligence Technology 4JD AI Research 5Kingsoft AI Lab
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described.
Open Datasets	Yes	We use the MSMO dataset (Zhu et al. 2018) which contains online news articles (723 tokens on average) paired with multiple image-caption pairs (6.58 images on average) and multi-sentence summaries (70 tokens on average).
Dataset Splits	Yes	The dataset includes 293,965 training pairs, 10,355 validation pairs, and 10,261 test pairs.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions models and components such as 'VGG19 pretrained on Image Net' but does not specify version numbers for any ancillary software dependencies or libraries used for implementation (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	We set λ to 1.0 and the image number K (the target when calculating the cross-entropy loss) to 3 here. Discussion on λ (See Table 4). To study the impact of λ, we conduct an experiment on how the model performance changes when λ varies from 0.5 to 2.0. Discussion on K (See Table 5). Table 5 depicts the experimental results of the model performance varying with K (the image number at target).