reproducibilityindex.ai

Exploring Human-Like Reading Strategy for Abstractive Text Summarization

Authors: Min Yang, Qiang Qu, Wenting Tu, Ying Shen, Zhou Zhao, Xiaojun Chen7362-7369

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To verify the effectiveness of HATS, we conduct extensive experiments on two real-life datasets, CNN/Daily Mail and Gigaword datasets. The experimental results demonstrate that HATS achieves impressive results on both datasets.
Researcher Affiliation	Academia	Min Yang,1 Qiang Qu,1 Wenting Tu,2 Ying Shen,3 Zhou Zhao,4 Xiaojun Chen5 1Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences 2Shanghai University of Finance and Economics 3Peking University Shenzhen Graduate School, 4 Zhejiang University, 5Shenzhen University {min.yang, qiang}@siat.ac.cn, tu.wenting@mail.shufe.edu.cn shenying@pkusz.edu.cn, zhaozhou@zju.edu.cn, xjchen@szu.edu.cn
Pseudocode	No	The paper does not contain a clearly labeled pseudocode block or algorithm.
Open Source Code	No	Code is available at https://github.com/kyunghyuncho/dl4mt-material
Open Datasets	Yes	CNN/Daily Mail Corpus (Hermann et al. 2015), The Gigaword corpus is originally introduced by (Graff et al. 2003).
Dataset Splits	Yes	Totally, it consists of 287,226 training instances, 13,368 validation instances and 11,490 test instances.
Hardware Specification	No	The paper does not provide specific details about the hardware used, such as GPU or CPU models.
Software Dependencies	No	Each text is tokenized with a widely used natural language processing toolkit NLTK3.
Experiment Setup	Yes	We set both dc and dk to 200. For the convolutional layer of discriminative model D, we set the number of feature maps of CNN to 200. The width of the convolution ﬁlters is set to be 2. We ﬁrst pre-train ML model for summarization with a learning rate of 0.15 (See, Liu, and Manning 2017). Then switch to HATS training using the Adam optimizer (Kingma and Ba 2014), with a mini-batch size of 16 and a learning rate of 0.001. We use the beam search with a beam size of 5 during decoding. Dropout (with the dropout rate of 0.2) and L2 regularization (with the weight decay value of 0.001) are used to avoid overﬁtting.