reproducibilityindex.ai

Aspect-Aware Multimodal Summarization for Chinese E-Commerce Products

Authors: Haoran Li, Peng Yuan, Song Xu, Youzheng Wu, Xiaodong He, Bowen Zhou8188-8195

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experimental results on this dataset demonstrate that our models signiﬁcantly outperform the comparative methods in terms of both the ROUGE score and manual evaluations.
Researcher Affiliation	Industry	JD AI Research {lihaoran24, yuanpeng29, xusong28, wuyouzheng1, xiaodong.he, bowen.zhou}@jd.com
Pseudocode	No	The paper describes methods using mathematical equations and textual descriptions, but does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our dataset and code are available1. 1https://github.com/hrlinlp/cepsum
Open Datasets	Yes	We construct CEPSUM, a Chinese E-commerce Product SUMmarization dataset that contains approximately 1.4 million manually created product summaries that are paired with detailed product information, including an image, a title, and other textual descriptions for each product. Our dataset and code are available1. 1https://github.com/hrlinlp/cepsum
Dataset Splits	Yes	Table 2: Corpus statistics. Category # Train # Valid # Test Home Appliances 437,646 10,000 10,000 Clothing 790,297 10,000 10,000 Cases&Bags 97,510 5,000 5,000
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models, memory, or processing speeds used for running experiments.
Software Dependencies	No	The paper mentions 'ROUGE-1.5.5 toolkit' but does not provide other specific software dependencies or library versions (e.g., Python, PyTorch, TensorFlow, CUDA versions) needed to replicate the experiment.
Experiment Setup	Yes	We set the sizes of the word embedding and the LSTM hidden state to 300 and 512, respectively. We set the initial learning rate for Adam to 5 × 10−4. The mini-batch size is set to 16. During training, we test ROUGE-2 (Lin 2004) F1 score and perplexity using the development set for every 5,000 batches, and we halve the learning rate if model’s ROUGE-2 score drops for 7 consecutive tests. We ﬁrst train our models without coverage until they converge using an early stopping strategy, and then we add the coverage mechanism to further train the models. During testing, we use the beam search with a beam size of 10 to generate the summaries, and character-based trigram repetition avoidance (Paulus, Xiong, and Socher 2018) is applied.