Faithful to the Original: Fact Aware Neural Abstractive Summarization

Authors: Ziqiang Cao, Furu Wei, Wenjie Li, Sujian Li

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on the Gigaword benchmark dataset demonstrate that our model can greatly reduce fake summaries by 80%. Notably, the fact descriptions also bring significant improvement on informativeness since they often condense the meaning of the source text. To verify the effectiveness of FTSum, we conduct extensive experiments on the Gigaword sentence summarization benchmark dataset (Rush, Chopra, and Weston 2015b).
Researcher Affiliation Collaboration Ziqiang Cao,1,2 Furu Wei,3 Wenjie Li,1,2 Sujian Li4 1Department of Computing, The Hong Kong Polytechnic University, Hong Kong 2Hong Kong Polytechnic University Shenzhen Research Institute, China 3Microsoft Research, Beijing, China 4Key Laboratory of Computational Linguistics, Peking University, China
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper mentions external scripts and frameworks used (e.g., 'script1 released by (Rush, Chopra, and Weston 2015b)' and 'the popular s2s framework dl4mt3 as the starting point') but does not provide a link or explicit statement for the open-source code of their own proposed model (FTSum).
Open Datasets Yes We conduct experiments on the Annotated English Gigaword corpus, as with (Rush, Chopra, and Weston 2015b).
Dataset Splits Yes The training and development datasets are built through the script1 released by (Rush, Chopra, and Weston 2015b). The script also performs various basic text normalization, including tokenization, lower-casing, replacing all digit characters with #, and mask the words appearing less than 5 times with a UNK tag. It comes up with about 3.8M sentence-headline pairs as the training set and 189K pairs as the development set. We use the same Gigaword test set as (Rush, Chopra, and Weston 2015b).
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory, or cloud instance types) used for running its experiments.
Software Dependencies No The paper mentions using 'dl4mt' and 'Stanford Core NLP' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes We set the learning rate α = 0.001 and the mini-batch size to 32. Similar to (Zhou et al. 2017), we evaluate the model performance on the development set for every 2000 batches and halve the learning rate if the cost increases for 10 consecutive validations. In addition, we apply gradient clipping (Pascanu, Mikolov, and Bengio 2013) with range [ 5, 5] during training to enhance the stability of the model. ... the size of word embeddings to 200. We initialize word embeddings with Glo Ve (Pennington, Socher, and Manning 2014). All the GRU hidden state dimensions are fixed to 400. We use dropout (Srivastava et al. 2014) with probability p = 0.5. With the decoder, we use the beam search of size 6 to generate the summary, and restrict the maximal length of a summary to 20 words.