Faithful to the Original: Fact Aware Neural Abstractive Summarization
Authors: Ziqiang Cao, Furu Wei, Wenjie Li, Sujian Li
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on the Gigaword benchmark dataset demonstrate that our model can greatly reduce fake summaries by 80%. Notably, the fact descriptions also bring significant improvement on informativeness since they often condense the meaning of the source text. To verify the effectiveness of FTSum, we conduct extensive experiments on the Gigaword sentence summarization benchmark dataset (Rush, Chopra, and Weston 2015b). |
| Researcher Affiliation | Collaboration | Ziqiang Cao,1,2 Furu Wei,3 Wenjie Li,1,2 Sujian Li4 1Department of Computing, The Hong Kong Polytechnic University, Hong Kong 2Hong Kong Polytechnic University Shenzhen Research Institute, China 3Microsoft Research, Beijing, China 4Key Laboratory of Computational Linguistics, Peking University, China |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions external scripts and frameworks used (e.g., 'script1 released by (Rush, Chopra, and Weston 2015b)' and 'the popular s2s framework dl4mt3 as the starting point') but does not provide a link or explicit statement for the open-source code of their own proposed model (FTSum). |
| Open Datasets | Yes | We conduct experiments on the Annotated English Gigaword corpus, as with (Rush, Chopra, and Weston 2015b). |
| Dataset Splits | Yes | The training and development datasets are built through the script1 released by (Rush, Chopra, and Weston 2015b). The script also performs various basic text normalization, including tokenization, lower-casing, replacing all digit characters with #, and mask the words appearing less than 5 times with a UNK tag. It comes up with about 3.8M sentence-headline pairs as the training set and 189K pairs as the development set. We use the same Gigaword test set as (Rush, Chopra, and Weston 2015b). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory, or cloud instance types) used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'dl4mt' and 'Stanford Core NLP' but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We set the learning rate α = 0.001 and the mini-batch size to 32. Similar to (Zhou et al. 2017), we evaluate the model performance on the development set for every 2000 batches and halve the learning rate if the cost increases for 10 consecutive validations. In addition, we apply gradient clipping (Pascanu, Mikolov, and Bengio 2013) with range [ 5, 5] during training to enhance the stability of the model. ... the size of word embeddings to 200. We initialize word embeddings with Glo Ve (Pennington, Socher, and Manning 2014). All the GRU hidden state dimensions are fixed to 400. We use dropout (Srivastava et al. 2014) with probability p = 0.5. With the decoder, we use the beam search of size 6 to generate the summary, and restrict the maximal length of a summary to 20 words. |