reproducibilityindex.ai

Faithful to the Original: Fact Aware Neural Abstractive Summarization

Authors: Ziqiang Cao, Furu Wei, Wenjie Li, Sujian Li

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on the Gigaword benchmark dataset demonstrate that our model can greatly reduce fake summaries by 80%. Notably, the fact descriptions also bring signiﬁcant improvement on informativeness since they often condense the meaning of the source text. To verify the effectiveness of FTSum, we conduct extensive experiments on the Gigaword sentence summarization benchmark dataset (Rush, Chopra, and Weston 2015b).
Researcher Affiliation	Collaboration	Ziqiang Cao,1,2 Furu Wei,3 Wenjie Li,1,2 Sujian Li4 1Department of Computing, The Hong Kong Polytechnic University, Hong Kong 2Hong Kong Polytechnic University Shenzhen Research Institute, China 3Microsoft Research, Beijing, China 4Key Laboratory of Computational Linguistics, Peking University, China
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper mentions external scripts and frameworks used (e.g., 'script1 released by (Rush, Chopra, and Weston 2015b)' and 'the popular s2s framework dl4mt3 as the starting point') but does not provide a link or explicit statement for the open-source code of their own proposed model (FTSum).
Open Datasets	Yes	We conduct experiments on the Annotated English Gigaword corpus, as with (Rush, Chopra, and Weston 2015b).
Dataset Splits	Yes	The training and development datasets are built through the script1 released by (Rush, Chopra, and Weston 2015b). The script also performs various basic text normalization, including tokenization, lower-casing, replacing all digit characters with #, and mask the words appearing less than 5 times with a UNK tag. It comes up with about 3.8M sentence-headline pairs as the training set and 189K pairs as the development set. We use the same Gigaword test set as (Rush, Chopra, and Weston 2015b).
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory, or cloud instance types) used for running its experiments.
Software Dependencies	No	The paper mentions using 'dl4mt' and 'Stanford Core NLP' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	We set the learning rate α = 0.001 and the mini-batch size to 32. Similar to (Zhou et al. 2017), we evaluate the model performance on the development set for every 2000 batches and halve the learning rate if the cost increases for 10 consecutive validations. In addition, we apply gradient clipping (Pascanu, Mikolov, and Bengio 2013) with range [ 5, 5] during training to enhance the stability of the model. ... the size of word embeddings to 200. We initialize word embeddings with Glo Ve (Pennington, Socher, and Manning 2014). All the GRU hidden state dimensions are ﬁxed to 400. We use dropout (Srivastava et al. 2014) with probability p = 0.5. With the decoder, we use the beam search of size 6 to generate the summary, and restrict the maximal length of a summary to 20 words.