Controlling the Amount of Verbatim Copying in Abstractive Summarization
Authors: Kaiqiang Song, Bingqing Wang, Zhe Feng, Ren Liu, Fei Liu8902-8909
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments we illustrate the significance of our proposed method on controlling the amount of verbatim copying and achieve competitive results over strong baselines. Our method demonstrates strong performance, either outperforming or performing on par with the best published results. |
| Researcher Affiliation | Collaboration | Computer Science Department, University of Central Florida, Orlando, FL 32816, USA Robert Bosch LLC, Sunnyvale, CA 94085, USA |
| Pseudocode | Yes | Algorithm 1 Best-First Search |
| Open Source Code | Yes | We make our implementation and models publicly available at https://github.com/ucfnlp/control-over-copying |
| Open Datasets | Yes | We conduct experiments on the Gigaword (Parker 2011) and Newsroom (Grusky, Naaman, and Artzi 2018) datasets. |
| Dataset Splits | Yes | The train/valid/test splits contain 4 million/10k/1951 instances for Gigaword and 199k/21k/21k instances for Newsroom. |
| Hardware Specification | Yes | Each model is fine-tuned for 6 epochs; an epoch takes about 5 hours on a Tesla V100 GPU. |
| Software Dependencies | No | The paper mentions using a 'pretrained BERTBASE (uncased) model' and a 'Transformer architecture,' along with the Adam optimizer and BPE tokenization. However, it does not specify version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used. |
| Experiment Setup | Yes | We initialize the model parameters using pretrained BERTBASE (uncased) model. We use the Adam optimizer with β1 = 0.9, β2 = 0.999. The learning rate is set to lr=4e-5 and it is halved whenever the validation loss does not change after 40,000 training steps. We set the weight decay to be 0.01 for regular layers and no weight decay for dropout and layer-normalization. Each model is fine-tuned for 6 epochs; Our batch size is set to be 32. The sampling rate p is set to 0.1 for source words and 0.9 for summary words, both seen and unseen. |