Multimodal Summarization with Guidance of Multimodal Reference
Authors: Junnan Zhu, Yu Zhou, Jiajun Zhang, Haoran Li, Chengqing Zong, Changliang Li9749-9756
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results have shown that our proposed model achieves the new state-of-the-art on both automatic and manual evaluation metrics. Besides, our proposed evaluation method can effectively improve the correlation with human judgments. |
| Researcher Affiliation | Collaboration | 1National Laboratory of Pattern Recognition, Institute of Automation, CAS 2University of Chinese Academy of Sciences 3CAS Center for Excellence in Brain Science and Intelligence Technology 4JD AI Research 5Kingsoft AI Lab |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | We use the MSMO dataset (Zhu et al. 2018) which contains online news articles (723 tokens on average) paired with multiple image-caption pairs (6.58 images on average) and multi-sentence summaries (70 tokens on average). |
| Dataset Splits | Yes | The dataset includes 293,965 training pairs, 10,355 validation pairs, and 10,261 test pairs. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions models and components such as 'VGG19 pretrained on Image Net' but does not specify version numbers for any ancillary software dependencies or libraries used for implementation (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We set λ to 1.0 and the image number K (the target when calculating the cross-entropy loss) to 3 here. Discussion on λ (See Table 4). To study the impact of λ, we conduct an experiment on how the model performance changes when λ varies from 0.5 to 2.0. Discussion on K (See Table 5). Table 5 depicts the experimental results of the model performance varying with K (the image number at target). |