From Neural Sentence Summarization to Headline Generation: A Coarse-to-Fine Approach

Authors: Jiwei Tan, Xiaojun Wan, Jianguo Xiao

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on a large real dataset demonstrate the proposed approach significantly improves the performance of neural sentence summarization models on the headline generation task. We conduct experiments on the New York Times news corpus.
Researcher Affiliation Academia Jiwei Tan and Xiaojun Wan and Jianguo Xiao Institute of Computer Science and Technology, Peking University The MOE Key Laboratory of Computational Linguistics, Peking University {tanjiwei, wanxiaojun, xiaojianguo}@pku.edu.cn
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper references third-party open-source tools (sumy, Theano, GloVe) but does not provide a link or statement for the authors' own implementation code for their proposed method.
Open Datasets Yes Previous sentence summarization models are evaluated on news articles from the English Gigaword corpus1, and only the lead sentences which have significant overlap with the headlines are selected. In this paper, we conduct experiments on the 1.4 million NYT articles. We train our model on the same Gigaword dataset used in [Rush et al., 2015; Chopra et al., 2016].
Dataset Splits No The paper mentions an 'early stopping strategy, which stops training if the performance no longer improves on held-out training data in 20 epochs,' implying a validation set. However, it does not specify the size or percentage of this held-out data as a distinct split for reproduction purposes.
Hardware Specification Yes We run the model on a GTX-1080 GPU card, and it takes about one day for every 100 epochs.
Software Dependencies No The paper mentions 'a Python toolkit sumy' and 'the multi-sentence summarization model with theano'. While software is named, specific version numbers for these dependencies are not provided.
Experiment Setup Yes For the summary encoder we use three hidden layers of LSTM, and for the control layer we use one layer of LSTM, and each layer has 512 hidden units. The dimension of word vectors is 100. The learning rate of RMSProp is 0.01 and the decay and momentum are both 0.9. We use a batch size of 64 samples, and process 30,016 samples an epoch.