reproducibilityindex.ai

MultiSumm: Towards a Unified Model for Multi-Lingual Abstractive Summarization

Authors: Yue Cao, Xiaojun Wan, Jinge Yao, Dian Yu45248

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments on summarization datasets for ﬁve rich-resource languages: English, Chinese, French, Spanish, and German, as well as two low-resource languages: Bosnian and Croatian. Experimental results show that our proposed model signiﬁcantly outperforms a multi-lingual baseline model.
Researcher Affiliation	Collaboration	Yue Cao,1,2,3 Xiaojun Wan,1,2,3 Jin-ge Yao,1 Dian Yu4 1Wangxuan Institute of Computer Technology, Peking University 2Center for Data Science, Peking University 3The MOE Key Laboratory of Computational Linguistics, Peking University 4Tencent AI Lab {yuecao, wanxiaojun, yaojinge}@pku.edu.cn, yudian@tencent.com
Pseudocode	Yes	Algorithm 1 Multi-Lingual Training Algorithm for Abstractive Text Summarization
Open Source Code	Yes	1https://github.com/ycao1996/Multi-Lingual-Summarization
Open Datasets	Yes	We use the Europarl-v5 dataset (KOEHN 2005) for English, German, Spanish, and French. ... We use the News-Commentary-v13 dataset (Tiedemann 2012) for Chinese... We use the SETIMES dataset (Tiedemann 2012) for Bosnian and Croatian... We use the Gigaword dataset for English, French, and Spanish summarization (Graff et al. 2003; Mendonc a, Graff, and Di Persio 2009a; 2009b). ... We use the LCSTS dataset (Hu, Chen, and Zhu 2015) for Chinese summarization. ... We use the SWISS dataset 3 for German summarization. ... As there is no existing summarization dataset for the low-resource languages Bosnian and Croatian, we ﬁrst build a new summarization dataset for the two languages.
Dataset Splits	Yes	We use the ofﬁcially divided training sets, validation sets, and test sets. ... we use part I as the training set, part II as the validation set, and samples with 3,4,5 scores in part III as the test set. The number of training pairs, validation pairs, and test pairs are 2,400,591, 10,666, and 725, respectively. ... We randomly split 80% of the samples as the training set, 10% of the samples as the validation set, and 10% of the samples as the test set.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments.
Software Dependencies	No	The paper mentions 'Fairseq toolkit' and 'subword-nmt toolkit' but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	For transformer architectures, the model hidden size, feed-forward hidden size, the number of layers, and the number of heads are 512, 2,048, 6, and 8, respectively. ... the batch size is set to 4,000 for multi-lingual models and 1,000 for individual models. ... We use warm-up learning rate (Goyal et al. 2017) for the ﬁrst 4,000 steps, and the initial warm-up learning rate is set to 1e-7. We use the dropout technique and set the dropout rate to 0.2. We use beam search for inference, and the beam size is set to 5 according to the results on the validation set.