Neural Abstractive Summarization with Structural Attention
Authors: Tanya Chowdhury, Sachin Kumar, Tanmoy Chakraborty
IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare the performance of the models on the basis of ROUGE-1, 2 and L F1-scores [Lin, 2004] on the CNN/Dailymail (Table 2), CQA and Multinews (Table 3) datasets. |
| Researcher Affiliation | Academia | Tanya Chowdhury1 , Sachin Kumar2 and Tanmoy Chakraborty1 1IIIT-Delhi, India 2Carnegie Mellon University, USA |
| Pseudocode | No | The paper describes the model architecture and computations in prose and mathematical equations but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is public at https: //bit.ly/35i7q93. |
| Open Datasets | Yes | (i) The CNN/Dailymail1 dataset [Hermann et al., 2015; Nallapati et al., 2016] (ii) We also use the CQA dataset2 [Chowdhury and Chakraborty, 2019] Additionally, we include analysis on Multi News [Fabbri et al., 2019] |
| Dataset Splits | Yes | The scripts released by [Nallapati et al., 2016] are used to extract approximately 250k training pairs, 13k validation pairs and 11.5k test pairs from the corpora. ... We split the 100k dataset into 80k training instances, 10k validation and 10k test instances. |
| Hardware Specification | Yes | GPU Ge Force 2080 Ti |
| Software Dependencies | No | The paper mentions 'Optimizer Adagrad' but does not provide specific version numbers for software or libraries used. |
| Experiment Setup | Yes | Parameter Value Vocabulary size 50,000 Input embedding dim 128 Training decoder steps 100 Learning Rate 0.15 Optimizer Adagrad Adagrad Init accumulator 0.1 Max gradient norm (for clipping) 2.0 Max decoding steps (for BS decoding) 120 Min decoding steps (for BS decoding) 35 Beam search width 4 Weight of coverage loss 1 |