reproducibilityindex.ai

DeepChannel: Salience Estimation by Contrastive Learning for Extractive Document Summarization

Authors: Jiaxin Shi, Chen Liang, Lei Hou, Juanzi Li, Zhiyuan Liu, Hanwang Zhang6999-7006

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In experiments, our model not only achieves state-of-the-art ROUGE scores on CNN/Daily Mail dataset, but also shows strong robustness in the out-of-domain test on DUC2007 test set. Moreover, our model reaches a ROUGE-1 F-1 score of 39.41 on CNN/Daily Mail test set with merely 1/100 training set, demonstrating a tremendous data efﬁciency.
Researcher Affiliation	Academia	Jiaxin Shi,1 Chen Liang,1 Lei Hou,1 Juanzi Li,1 Zhiyuan Liu,1 Hanwang Zhang2 1Tsinghua University 2Nanyang Technological University {shijx12,lliangchenc}@gmail.com, {houlei,lijuanzi,liuzy}@tsinghua.edu.cn, hanwangzhang@ntu.edu.sg
Pseudocode	Yes	Algorithm 1 Greedy Extraction Algorithm Input: document D = {d1, d2, ..., d\|D\|} , a well-pretrained channel model P(D\|S), expected summary length l Output: optimal summary S S {} while \|S \| < l do d, p nil, 0 for di D S do pi P(D\|S {di}) according to Formula 3 if pi > p then d, p = di, pi end if end for S = S {d} end while Resort S based on the order in D return S
Open Source Code	Yes	The implementation is made publicly available.4 4https://github.com/lliangchenc/Deep Channel
Open Datasets	Yes	Datasets We evaluate our model on two datasets: CNN/Daily Mail (Hermann et al. 2015; Nallapati et al. 2016; See, Liu, and Manning 2017; Hsu et al. 2018) and DUC 2007.
Dataset Splits	Yes	We follow (Hsu et al. 2018) and obtain the non-anonymized version of this dataset which has 287,113 training pairs, 13,368 validation pairs, and 11,490 test pairs.
Hardware Specification	Yes	To obtain the results in Table 2, Deep Channel only needs to be trained one epoch on CNN/Daily Mail training set, taking about four hours with an Nvidia GTX 1080Ti GPU.
Software Dependencies	No	The paper mentions using Glo Ve for word embeddings, Adam optimizer, and GRU, but does not provide specific version numbers for these or any other software libraries or programming languages used.
Experiment Setup	Yes	For the model, we set the dimension of the word embedding to 300, and the GRU hidden dimension to 1024. We use a 3-layered MLP to calculate P(di\|S) in Formula 2, which consists of 3 linear layers, 2 Re LU layers and an output sigmoid layer. We use dropout (Srivastava et al. 2014) with probability 0.3 after the word embedding layer and before the ﬁrst layer of the MLP. ... We use Adam (Kingma and Ba 2014) optimizer with a ﬁxed learning rate of 1e-5 to train our model. We set the weight of the penalization term α = 0.001. When extracting sentences, we ﬁx the number of target sentences (i.e., l in Algorithm 1) to 3.