Contextualized Rewriting for Text Summarization

Authors: Guangsheng Bao, Yue Zhang12544-12553

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our models are evaluated on the CNN/DM dataset (Hermann et al. 2015). Results show that the contextualized rewriter gives significantly improved ROUGE (Lin 2004) scores compared with a state-of-the-art extractive baseline, outperforming a traditional rewriter baseline by a large margin.
Researcher Affiliation Academia Guangsheng Bao1,2, Yue Zhang1,2 1 School of Engineering, Westlake University 2 Institute of Advanced Technology, Westlake Institute for Advanced Study {baoguangsheng, zhangyue}@westlake.edu.cn
Pseudocode No The paper does not contain any pseudocode or explicitly labeled algorithm blocks.
Open Source Code Yes We release our code at https://github.com/baoguangsheng/ctx-rewriter-for-summ.git.
Open Datasets Yes We evaluate our model on the CNN/Daily Mail dataset (Hermann et al. 2015), which comprises online news articles with several human written highlights (on average 3.75 per article).
Dataset Splits Yes We use the non-anonymized version and follow the standard splitting of Hermann et al. (2015), which includes 287,227 samples for training, 13,368 for dev testing, and 11,490 for testing.
Hardware Specification Yes The model is trained with 2 v100 GPUs for about 9 hours. (...) We train the model with 2 GPUs on a v100 machine for about 60 hours.
Software Dependencies Yes All scores are calculated using pyrouge. 1 (footnote 1: https://pypi.org/project/pyrouge/0.1.3/)
Experiment Setup Yes For inference, we select sentences according to the hyperparameters min sel = 3, max sel = 5 and threshold = 0.35, which are chosen by a grid search to find the best average score of ROUGE 1/2/L on the dev dataset. (...) The encoder and extractor are jointly trained for a total of 50,000 steps with a learning rate schedule (...) where warmup = 10, 000.