Exploring Human-Like Reading Strategy for Abstractive Text Summarization

Authors: Min Yang, Qiang Qu, Wenting Tu, Ying Shen, Zhou Zhao, Xiaojun Chen7362-7369

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To verify the effectiveness of HATS, we conduct extensive experiments on two real-life datasets, CNN/Daily Mail and Gigaword datasets. The experimental results demonstrate that HATS achieves impressive results on both datasets.
Researcher Affiliation Academia Min Yang,1 Qiang Qu,1 Wenting Tu,2 Ying Shen,3 Zhou Zhao,4 Xiaojun Chen5 1Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences 2Shanghai University of Finance and Economics 3Peking University Shenzhen Graduate School, 4 Zhejiang University, 5Shenzhen University {min.yang, qiang}@siat.ac.cn, tu.wenting@mail.shufe.edu.cn shenying@pkusz.edu.cn, zhaozhou@zju.edu.cn, xjchen@szu.edu.cn
Pseudocode No The paper does not contain a clearly labeled pseudocode block or algorithm.
Open Source Code No Code is available at https://github.com/kyunghyuncho/dl4mt-material
Open Datasets Yes CNN/Daily Mail Corpus (Hermann et al. 2015), The Gigaword corpus is originally introduced by (Graff et al. 2003).
Dataset Splits Yes Totally, it consists of 287,226 training instances, 13,368 validation instances and 11,490 test instances.
Hardware Specification No The paper does not provide specific details about the hardware used, such as GPU or CPU models.
Software Dependencies No Each text is tokenized with a widely used natural language processing toolkit NLTK3.
Experiment Setup Yes We set both dc and dk to 200. For the convolutional layer of discriminative model D, we set the number of feature maps of CNN to 200. The width of the convolution filters is set to be 2. We first pre-train ML model for summarization with a learning rate of 0.15 (See, Liu, and Manning 2017). Then switch to HATS training using the Adam optimizer (Kingma and Ba 2014), with a mini-batch size of 16 and a learning rate of 0.001. We use the beam search with a beam size of 5 during decoding. Dropout (with the dropout rate of 0.2) and L2 regularization (with the weight decay value of 0.001) are used to avoid overfitting.