Learning to Extract Coherent Summary via Deep Reinforcement Learning
Authors: Yuxiang Wu, Baotian Hu
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results show that the proposed neural coherence model can efficiently capture the cross-sentence coherence patterns. The experimental results show that the proposed RNES outperforms existing baselines and achieves state-of-the-art performance in term of ROUGE on CNN/Daily Mail dataset. |
| Researcher Affiliation | Academia | Hong Kong University of Science and Technology Hong Kong ywubw@cse.ust.hk University of Massachusetts Medical School MA, USA Baotian.Hu@umassmed.edu |
| Pseudocode | Yes | Algorithm 1 Overall training algorithm of RNES model. α is the learning rate, χ is a placeholder sentence for bootstrapping the coherence score of the first extracted sentence. |
| Open Source Code | No | No statement regarding the release of open-source code or a link to a code repository was found in the paper. |
| Open Datasets | Yes | We use the CNN/Daily Mail dataset originally introduced by (Hermann et al. 2015) to evaluate our model. |
| Dataset Splits | Yes | It contains 287,226 documents for training, 13,368 documents for validation and 11,490 documents for test. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running experiments were mentioned in the paper. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., Python 3.x, TensorFlow 2.x, PyTorch 1.x) were mentioned in the paper. |
| Experiment Setup | Yes | In our experiments, we use 64-dimensional word embeddings which are randomly initialized and finetuned in the process of supervised training. The sizes of all its convolutional kernels are set to 3. The first convolution layer has 128 filters. The second and third convolution layers contain 256 and 512 filters respectively. Each convolution layer is followed by a max-pooling layer performed on the sliding non-overlapping 2 2 windows. The final two fullyconnected layers have 512 and 256 hidden units respectively. The maximum sentence length is 50. Sentences longer than the limit would be truncated, and those that are shorter than this length would be padded with zeros. The coherence model is trained with stochastic gradient descent (SGD) with batch size 64 and learning rate 0.1. For the NES/RNES model, we use 128-dimensional word embeddings and the vocabulary size is 150,000. The convolution kernels have size 3, 5, 7 with 128, 256, 256 filters respectively. We set the hidden state size of sentence-level GRU to 256, and the document representation size to 512. The MLP has two layers, with 512 and 256 hidden units respectively. We fix the maximum sentence length to 50 and the maximum number of sentences in a document to 80. Sentences or documents that are longer than the maximum length are truncated to fit the length requirement. The model is trained with stochastic gradient descent (SGD) with batch size 64. In our experiments, we explored λ = 1.0, 0.1, 0.01, 0.005. We use w1 = 0.4, w2 = 1.0, wl = 0.5 in our experiments to ensure balanced enhancement. At test time our model produces summaries by beam search with beam size 10. |