reproducibilityindex.ai

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Authors: Jingqing Zhang, Yao Zhao, Mohammad Saleh, Peter Liu

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluated our best PEGASUS model on 12 downstream summarization tasks spanning news, science, stories, instructions, emails, patents, and legislative bills. Experiments demonstrate it achieves state-of-the-art performance on all 12 downstream datasets measured by ROUGE scores.
Researcher Affiliation	Collaboration	1Data Science Institute, Imperial College London, London, UK 2Brain Team, Google Research, Mountain View, CA, USA.
Pseudocode	Yes	Algorithm 1 Sequential Sentence Selection
Open Source Code	Yes	The training code and instructions for using model checkpoints can be found at https://github.com/google-research/ pegasus
Open Datasets	Yes	For downstream summarization, we only used public abstractive summarization datasets, and access them through Tensor Flow Summarization Datasets 1, which provides publicly reproducible code for dataset processing and train/validation/test splits. 1https://www.tensorflow.org/datasets/ catalog/overview
Dataset Splits	Yes	We used train/validation/test ratio of 80/10/10 if no split was provided, and 10% train split as validation if there was no validation split.
Hardware Specification	No	The paper describes model architectures (e.g., layers, hidden size), but does not specify the type or model of hardware (e.g., GPU, CPU, TPU) used for training or experimentation.
Software Dependencies	No	The paper mentions software components like Adafactor, Byte-pair encoding (BPE), and Sentence Piece Unigram, but does not provide specific version numbers for these or other libraries used for the experiments.
Experiment Setup	Yes	We pre-trained PEGASUSBASE with a batch size of 256 and PEGASUSLARGE with a batch size of 8192. We used sinusoidal positional encoding following Vaswani et al. (2017). For optimization, both pre-training and ﬁnetuning used Adafactor (Shazeer & Stern, 2018) with square root learning rate decay and dropout rate of 0.1. We used greedy-decoding for studies in Section 6.1, and used beam-search with a length-penalty, α, as in Wu et al. (2016) for the ﬁnal large model. All experiments hyper parameters can be found in Appendix C and reported numbers are in Appendix D and E. PEGASUSBASE had L = 12, H = 768, F = 3072, A = 12 and PEGASUSLARGE had L = 16, H = 1024, F = 4096, A = 16.