Sequence Level Contrastive Learning for Text Summarization
Authors: Shusheng Xu, Xingxing Zhang, Yi Wu, Furu Wei11556-11565
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments, we find our proposed contrastive learning based model Seq Co consistently improves upon a strong abstractive summarization model based on BART (Lewis et al. 2020) across three different summarization datasets (i.e., CNN/Daily Mail (Hermann et al. 2015), New York Times (Sandhaus 2008) and XSum (Narayan, Cohen, and Lapata 2018)). Human evaluation also shows that our model Seq Co achieves better faithfulness ratings compared to its counterpart without contrastive objectives. |
| Researcher Affiliation | Collaboration | Shusheng Xu, 1* Xingxing Zhang, 2 Yi Wu, 1,3 Furu Wei 2 1 IIIS, Tsinghua University, Beijing, China 2 Microsoft Research Asia, Beijing, China 3 Shanghai Qi Zhi Institute, Shanghai China xuss20@mails.tsinghua.edu.cn xizhang@microsoft.com jxwuyi@gmail.com fuwei@microsoft.com |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. It uses mathematical equations and diagrams to describe the model. |
| Open Source Code | Yes | We release our code at https://github.com/xssstory/Seq Co. |
| Open Datasets | Yes | We conduct our experiments on three summarization datasets. The CNN/Daily Mail dataset (CNNDM; Hermann et al. 2015)... The New York Times dataset (NYT; Sandhaus 2008)... The articles in the XSum dataset (Narayan, Cohen, and Lapata 2018)... |
| Dataset Splits | Yes | CNNDM...contains 287,226 articles for training, 13,368 for validation and 11,490 for test. NYT...38,264 articles for training and 4,000 articles for validation. XSum...204,045 articles for training, 11,332 articles for validation and 11,334 articles for test. |
| Hardware Specification | Yes | The models for CNNDM are trained on 8 Tesla V100 GPUs, and the models for the other datasets are trained on 4 Tesla V100 GPUs. |
| Software Dependencies | Yes | ROUGE scores are computed with the ROUGE-1.5.5.pl script. |
| Experiment Setup | Yes | The peak learning rate, warmup steps, total number of updates and batch size are tuned on validation sets and are different across datasets, which are 1000, 20000, 4e 5, 128 on CNNDM, 500, 5000, 2e 5, 64 on NYT, 500, and 15000, 6e 5, 64 on XSum. Parameters ξ infξ are updated following Equation (13) with τ = 0.99. We employ label smoothing of 0.1 (Szegedy et al. 2016; Vaswani et al. 2017). |