Controlling the Amount of Verbatim Copying in Abstractive Summarization

Authors: Kaiqiang Song, Bingqing Wang, Zhe Feng, Ren Liu, Fei Liu8902-8909

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experiments we illustrate the significance of our proposed method on controlling the amount of verbatim copying and achieve competitive results over strong baselines. Our method demonstrates strong performance, either outperforming or performing on par with the best published results.
Researcher Affiliation Collaboration Computer Science Department, University of Central Florida, Orlando, FL 32816, USA Robert Bosch LLC, Sunnyvale, CA 94085, USA
Pseudocode Yes Algorithm 1 Best-First Search
Open Source Code Yes We make our implementation and models publicly available at https://github.com/ucfnlp/control-over-copying
Open Datasets Yes We conduct experiments on the Gigaword (Parker 2011) and Newsroom (Grusky, Naaman, and Artzi 2018) datasets.
Dataset Splits Yes The train/valid/test splits contain 4 million/10k/1951 instances for Gigaword and 199k/21k/21k instances for Newsroom.
Hardware Specification Yes Each model is fine-tuned for 6 epochs; an epoch takes about 5 hours on a Tesla V100 GPU.
Software Dependencies No The paper mentions using a 'pretrained BERTBASE (uncased) model' and a 'Transformer architecture,' along with the Adam optimizer and BPE tokenization. However, it does not specify version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used.
Experiment Setup Yes We initialize the model parameters using pretrained BERTBASE (uncased) model. We use the Adam optimizer with β1 = 0.9, β2 = 0.999. The learning rate is set to lr=4e-5 and it is halved whenever the validation loss does not change after 40,000 training steps. We set the weight decay to be 0.01 for regular layers and no weight decay for dropout and layer-normalization. Each model is fine-tuned for 6 epochs; Our batch size is set to be 32. The sampling rate p is set to 0.1 for source words and 0.9 for summary words, both seen and unseen.