Joint Parsing and Generation for Abstractive Summarization
Authors: Kaiqiang Song, Logan Lebanoff, Qipeng Guo, Xipeng Qiu, Xiangyang Xue, Chen Li, Dong Yu, Fei Liu8894-8901
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method on a number of summarization datasets and demonstrate competitive results against strong baselines. Experiments are performed on a variety of summarization datasets to demonstrate the effectiveness of the proposed method. |
| Researcher Affiliation | Collaboration | Kaiqiang Song,1 Logan Lebanoff,1 Qipeng Guo2 Xipeng Qiu,2 Xiangyang Xue,2 Chen Li,3 Dong Yu,3 Fei Liu1 1Computer Science Department, University of Central Florida 2School of Computer Science, Fudan University, 3Tencent AI Lab, Bellevue, WA |
| Pseudocode | No | The paper describes the model architecture and process in narrative text and mathematical equations but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | We make our implementation and models publicly available at https://github.com/ucfnlp/joint-parse-n-summarize |
| Open Datasets | Yes | We experiment with GIGAWORD (Parker 2011) and NEWSROOM (Grusky, Naaman, and Artzi 2018). The CNN/DM dataset (Hermann et al. 2015) has been extensively studied. In this work we present a novel use of a newly released dataset Web Split (Narayan et al. 2017). |
| Dataset Splits | Yes | In Table 3, we provide statistics of all datasets used in this study. |y| Train Dev Test GIGAWORD 8.41 4,020,581 4,096 1,951 NEWSROOM 10.18 199,341 21,530 21,382 CNN/DM-R 13.89 472,872 25,326 20,122 WEBMERGE 31.43 1,331,515 40,879 43,934 |
| Hardware Specification | No | The paper specifies training parameters and model configurations but does not provide any specific hardware details such as GPU or CPU models, or cloud computing instance types used for experiments. |
| Software Dependencies | No | The paper mentions using the Stanford parser (Chen and Manning 2014) but does not provide specific version numbers for it or any other software libraries or frameworks used. |
| Experiment Setup | Yes | We create an input vocabulary to contains word appearing 5 times or more in the dataset; the output vocabulary contains the most frequent 10k words. We set all LSTM hidden states to be 256 dimensions. During training, we use a batch size of 64 and Adam (Kingma and Ba 2015) for parameter optimization, with lr=1e-3, betas=[0.9,0.999], and eps=1e-8. We apply gradient clipping of [-5,5], and a weight decay of 1e-6. At decoding time, we apply beam search with reference (Tan, Wan, and Xiao 2017) to generate summary sequences. K=10 is the beam size. |