Exploring Human-Like Reading Strategy for Abstractive Text Summarization
Authors: Min Yang, Qiang Qu, Wenting Tu, Ying Shen, Zhou Zhao, Xiaojun Chen7362-7369
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To verify the effectiveness of HATS, we conduct extensive experiments on two real-life datasets, CNN/Daily Mail and Gigaword datasets. The experimental results demonstrate that HATS achieves impressive results on both datasets. |
| Researcher Affiliation | Academia | Min Yang,1 Qiang Qu,1 Wenting Tu,2 Ying Shen,3 Zhou Zhao,4 Xiaojun Chen5 1Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences 2Shanghai University of Finance and Economics 3Peking University Shenzhen Graduate School, 4 Zhejiang University, 5Shenzhen University {min.yang, qiang}@siat.ac.cn, tu.wenting@mail.shufe.edu.cn shenying@pkusz.edu.cn, zhaozhou@zju.edu.cn, xjchen@szu.edu.cn |
| Pseudocode | No | The paper does not contain a clearly labeled pseudocode block or algorithm. |
| Open Source Code | No | Code is available at https://github.com/kyunghyuncho/dl4mt-material |
| Open Datasets | Yes | CNN/Daily Mail Corpus (Hermann et al. 2015), The Gigaword corpus is originally introduced by (Graff et al. 2003). |
| Dataset Splits | Yes | Totally, it consists of 287,226 training instances, 13,368 validation instances and 11,490 test instances. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used, such as GPU or CPU models. |
| Software Dependencies | No | Each text is tokenized with a widely used natural language processing toolkit NLTK3. |
| Experiment Setup | Yes | We set both dc and dk to 200. For the convolutional layer of discriminative model D, we set the number of feature maps of CNN to 200. The width of the convolution filters is set to be 2. We first pre-train ML model for summarization with a learning rate of 0.15 (See, Liu, and Manning 2017). Then switch to HATS training using the Adam optimizer (Kingma and Ba 2014), with a mini-batch size of 16 and a learning rate of 0.001. We use the beam search with a beam size of 5 during decoding. Dropout (with the dropout rate of 0.2) and L2 regularization (with the weight decay value of 0.001) are used to avoid overfitting. |