reproducibilityindex.ai

Sequence Level Contrastive Learning for Text Summarization

Authors: Shusheng Xu, Xingxing Zhang, Yi Wu, Furu Wei11556-11565

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In experiments, we ﬁnd our proposed contrastive learning based model Seq Co consistently improves upon a strong abstractive summarization model based on BART (Lewis et al. 2020) across three different summarization datasets (i.e., CNN/Daily Mail (Hermann et al. 2015), New York Times (Sandhaus 2008) and XSum (Narayan, Cohen, and Lapata 2018)). Human evaluation also shows that our model Seq Co achieves better faithfulness ratings compared to its counterpart without contrastive objectives.
Researcher Affiliation	Collaboration	Shusheng Xu, 1* Xingxing Zhang, 2 Yi Wu, 1,3 Furu Wei 2 1 IIIS, Tsinghua University, Beijing, China 2 Microsoft Research Asia, Beijing, China 3 Shanghai Qi Zhi Institute, Shanghai China xuss20@mails.tsinghua.edu.cn xizhang@microsoft.com jxwuyi@gmail.com fuwei@microsoft.com
Pseudocode	No	The paper does not contain any pseudocode or algorithm blocks. It uses mathematical equations and diagrams to describe the model.
Open Source Code	Yes	We release our code at https://github.com/xssstory/Seq Co.
Open Datasets	Yes	We conduct our experiments on three summarization datasets. The CNN/Daily Mail dataset (CNNDM; Hermann et al. 2015)... The New York Times dataset (NYT; Sandhaus 2008)... The articles in the XSum dataset (Narayan, Cohen, and Lapata 2018)...
Dataset Splits	Yes	CNNDM...contains 287,226 articles for training, 13,368 for validation and 11,490 for test. NYT...38,264 articles for training and 4,000 articles for validation. XSum...204,045 articles for training, 11,332 articles for validation and 11,334 articles for test.
Hardware Specification	Yes	The models for CNNDM are trained on 8 Tesla V100 GPUs, and the models for the other datasets are trained on 4 Tesla V100 GPUs.
Software Dependencies	Yes	ROUGE scores are computed with the ROUGE-1.5.5.pl script.
Experiment Setup	Yes	The peak learning rate, warmup steps, total number of updates and batch size are tuned on validation sets and are different across datasets, which are 1000, 20000, 4e 5, 128 on CNNDM, 500, 5000, 2e 5, 64 on NYT, 500, and 15000, 6e 5, 64 on XSum. Parameters ξ infξ are updated following Equation (13) with τ = 0.99. We employ label smoothing of 0.1 (Szegedy et al. 2016; Vaswani et al. 2017).