SemSUM: Semantic Dependency Guided Neural Abstractive Summarization
Authors: Hanqi Jin, Tianming Wang, Xiaojun Wan8026-8033
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our model on the English Gigaword, DUC 2004 and MSR abstractive sentence summarization datasets. Experiments show that the proposed model improves semantic relevance and reduces content deviation, and also brings significant improvements on automatic evaluation ROUGE metrics. |
| Researcher Affiliation | Academia | Hanqi Jin,1,2,3 Tianming Wang,1,3 Xiaojun Wan1,2,3 1Wangxuan Institute of Computer Technology, Peking University 2Center for Data Science, Peking University 3The MOE Key Laboratory of Computational Linguistics, Peking University {jinhanqi, wangtm, wanxiaojun}@pku.edu.cn |
| Pseudocode | No | The paper presents architectural diagrams and mathematical formulations but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is publicly available at https://github.com/zhongxia96/Sem SUM. |
| Open Datasets | Yes | We experiment with the English Gigaword dataset1 (Napoles, Gormley, and Durme 2012), the DUC2004 dataset (Over, Dang, and Harman 2007) and the MSR-ATC Test Set (Toutanova et al. 2016). The Gigaword dataset contains about 3.8M sentence-summary pairs for training and 189K pairs for development. For test, we use the standard test set of 1951 sentence-summary pairs. ... 1All the training, validation and test dataset can be downloaded at https://github.com/harvardnlp/sent-summary. |
| Dataset Splits | Yes | The Gigaword dataset contains about 3.8M sentence-summary pairs for training and 189K pairs for development. For test, we use the standard test set of 1951 sentence-summary pairs. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU or CPU models used for running the experiments. It lacks information beyond general mentions of computations. |
| Software Dependencies | No | The paper mentions using the fairseq toolkit but does not provide specific version numbers for any software dependencies like programming languages, libraries, or frameworks (e.g., Python version, PyTorch version). |
| Experiment Setup | Yes | We set our model parameters based on preliminary experiments on the development set. We prune the vocabulary to 50k and use the word in source sentence with maximum weights in copy attention to replace the unknown word to solve the OOVs problem. We set the dimension of word embeddings and hidden units dmodel to 512, feed-forward units to 2048. We set 4 heads for multi-head graph-attention and 8 heads for multi-head self-attention, masked multi-head selfattention and multi-head cross-attention. We set the number of layers of sentence encoder L1, graph encoder L2, and summary decoder L3 to 4, 3 and 6, respectively. We set dropout rate to 0.1 and use Adam optimizer with an initial learning rate α = 0.0001, momentum β1 = 0.9, β2 = 0.999 and weight decay ϵ = 10 5. The learning rate is halved if the valid loss on the development set increases for two consecutive epochs. We use a mini-batch size of 300. Beam search with beam size of 5 is used for decoding. |