reproducibilityindex.ai

Learning to Extract Coherent Summary via Deep Reinforcement Learning

Authors: Yuxiang Wu, Baotian Hu

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results show that the proposed neural coherence model can efﬁciently capture the cross-sentence coherence patterns. The experimental results show that the proposed RNES outperforms existing baselines and achieves state-of-the-art performance in term of ROUGE on CNN/Daily Mail dataset.
Researcher Affiliation	Academia	Hong Kong University of Science and Technology Hong Kong ywubw@cse.ust.hk University of Massachusetts Medical School MA, USA Baotian.Hu@umassmed.edu
Pseudocode	Yes	Algorithm 1 Overall training algorithm of RNES model. α is the learning rate, χ is a placeholder sentence for bootstrapping the coherence score of the ﬁrst extracted sentence.
Open Source Code	No	No statement regarding the release of open-source code or a link to a code repository was found in the paper.
Open Datasets	Yes	We use the CNN/Daily Mail dataset originally introduced by (Hermann et al. 2015) to evaluate our model.
Dataset Splits	Yes	It contains 287,226 documents for training, 13,368 documents for validation and 11,490 documents for test.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running experiments were mentioned in the paper.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., Python 3.x, TensorFlow 2.x, PyTorch 1.x) were mentioned in the paper.
Experiment Setup	Yes	In our experiments, we use 64-dimensional word embeddings which are randomly initialized and ﬁnetuned in the process of supervised training. The sizes of all its convolutional kernels are set to 3. The ﬁrst convolution layer has 128 ﬁlters. The second and third convolution layers contain 256 and 512 ﬁlters respectively. Each convolution layer is followed by a max-pooling layer performed on the sliding non-overlapping 2 2 windows. The ﬁnal two fullyconnected layers have 512 and 256 hidden units respectively. The maximum sentence length is 50. Sentences longer than the limit would be truncated, and those that are shorter than this length would be padded with zeros. The coherence model is trained with stochastic gradient descent (SGD) with batch size 64 and learning rate 0.1. For the NES/RNES model, we use 128-dimensional word embeddings and the vocabulary size is 150,000. The convolution kernels have size 3, 5, 7 with 128, 256, 256 ﬁlters respectively. We set the hidden state size of sentence-level GRU to 256, and the document representation size to 512. The MLP has two layers, with 512 and 256 hidden units respectively. We ﬁx the maximum sentence length to 50 and the maximum number of sentences in a document to 80. Sentences or documents that are longer than the maximum length are truncated to ﬁt the length requirement. The model is trained with stochastic gradient descent (SGD) with batch size 64. In our experiments, we explored λ = 1.0, 0.1, 0.01, 0.005. We use w1 = 0.4, w2 = 1.0, wl = 0.5 in our experiments to ensure balanced enhancement. At test time our model produces summaries by beam search with beam size 10.