PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization
Authors: Jingqing Zhang, Yao Zhao, Mohammad Saleh, Peter Liu
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluated our best PEGASUS model on 12 downstream summarization tasks spanning news, science, stories, instructions, emails, patents, and legislative bills. Experiments demonstrate it achieves state-of-the-art performance on all 12 downstream datasets measured by ROUGE scores. |
| Researcher Affiliation | Collaboration | 1Data Science Institute, Imperial College London, London, UK 2Brain Team, Google Research, Mountain View, CA, USA. |
| Pseudocode | Yes | Algorithm 1 Sequential Sentence Selection |
| Open Source Code | Yes | The training code and instructions for using model checkpoints can be found at https://github.com/google-research/ pegasus |
| Open Datasets | Yes | For downstream summarization, we only used public abstractive summarization datasets, and access them through Tensor Flow Summarization Datasets 1, which provides publicly reproducible code for dataset processing and train/validation/test splits. 1https://www.tensorflow.org/datasets/ catalog/overview |
| Dataset Splits | Yes | We used train/validation/test ratio of 80/10/10 if no split was provided, and 10% train split as validation if there was no validation split. |
| Hardware Specification | No | The paper describes model architectures (e.g., layers, hidden size), but does not specify the type or model of hardware (e.g., GPU, CPU, TPU) used for training or experimentation. |
| Software Dependencies | No | The paper mentions software components like Adafactor, Byte-pair encoding (BPE), and Sentence Piece Unigram, but does not provide specific version numbers for these or other libraries used for the experiments. |
| Experiment Setup | Yes | We pre-trained PEGASUSBASE with a batch size of 256 and PEGASUSLARGE with a batch size of 8192. We used sinusoidal positional encoding following Vaswani et al. (2017). For optimization, both pre-training and finetuning used Adafactor (Shazeer & Stern, 2018) with square root learning rate decay and dropout rate of 0.1. We used greedy-decoding for studies in Section 6.1, and used beam-search with a length-penalty, α, as in Wu et al. (2016) for the final large model. All experiments hyper parameters can be found in Appendix C and reported numbers are in Appendix D and E. PEGASUSBASE had L = 12, H = 768, F = 3072, A = 12 and PEGASUSLARGE had L = 16, H = 1024, F = 4096, A = 16. |