From Neural Sentence Summarization to Headline Generation: A Coarse-to-Fine Approach
Authors: Jiwei Tan, Xiaojun Wan, Jianguo Xiao
IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on a large real dataset demonstrate the proposed approach significantly improves the performance of neural sentence summarization models on the headline generation task. We conduct experiments on the New York Times news corpus. |
| Researcher Affiliation | Academia | Jiwei Tan and Xiaojun Wan and Jianguo Xiao Institute of Computer Science and Technology, Peking University The MOE Key Laboratory of Computational Linguistics, Peking University {tanjiwei, wanxiaojun, xiaojianguo}@pku.edu.cn |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper references third-party open-source tools (sumy, Theano, GloVe) but does not provide a link or statement for the authors' own implementation code for their proposed method. |
| Open Datasets | Yes | Previous sentence summarization models are evaluated on news articles from the English Gigaword corpus1, and only the lead sentences which have significant overlap with the headlines are selected. In this paper, we conduct experiments on the 1.4 million NYT articles. We train our model on the same Gigaword dataset used in [Rush et al., 2015; Chopra et al., 2016]. |
| Dataset Splits | No | The paper mentions an 'early stopping strategy, which stops training if the performance no longer improves on held-out training data in 20 epochs,' implying a validation set. However, it does not specify the size or percentage of this held-out data as a distinct split for reproduction purposes. |
| Hardware Specification | Yes | We run the model on a GTX-1080 GPU card, and it takes about one day for every 100 epochs. |
| Software Dependencies | No | The paper mentions 'a Python toolkit sumy' and 'the multi-sentence summarization model with theano'. While software is named, specific version numbers for these dependencies are not provided. |
| Experiment Setup | Yes | For the summary encoder we use three hidden layers of LSTM, and for the control layer we use one layer of LSTM, and each layer has 512 hidden units. The dimension of word vectors is 100. The learning rate of RMSProp is 0.01 and the decay and momentum are both 0.9. We use a batch size of 64 samples, and process 30,016 samples an epoch. |