Joint Parsing and Generation for Abstractive Summarization

Authors: Kaiqiang Song, Logan Lebanoff, Qipeng Guo, Xipeng Qiu, Xiangyang Xue, Chen Li, Dong Yu, Fei Liu8894-8901

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method on a number of summarization datasets and demonstrate competitive results against strong baselines. Experiments are performed on a variety of summarization datasets to demonstrate the effectiveness of the proposed method.
Researcher Affiliation Collaboration Kaiqiang Song,1 Logan Lebanoff,1 Qipeng Guo2 Xipeng Qiu,2 Xiangyang Xue,2 Chen Li,3 Dong Yu,3 Fei Liu1 1Computer Science Department, University of Central Florida 2School of Computer Science, Fudan University, 3Tencent AI Lab, Bellevue, WA
Pseudocode No The paper describes the model architecture and process in narrative text and mathematical equations but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes We make our implementation and models publicly available at https://github.com/ucfnlp/joint-parse-n-summarize
Open Datasets Yes We experiment with GIGAWORD (Parker 2011) and NEWSROOM (Grusky, Naaman, and Artzi 2018). The CNN/DM dataset (Hermann et al. 2015) has been extensively studied. In this work we present a novel use of a newly released dataset Web Split (Narayan et al. 2017).
Dataset Splits Yes In Table 3, we provide statistics of all datasets used in this study. |y| Train Dev Test GIGAWORD 8.41 4,020,581 4,096 1,951 NEWSROOM 10.18 199,341 21,530 21,382 CNN/DM-R 13.89 472,872 25,326 20,122 WEBMERGE 31.43 1,331,515 40,879 43,934
Hardware Specification No The paper specifies training parameters and model configurations but does not provide any specific hardware details such as GPU or CPU models, or cloud computing instance types used for experiments.
Software Dependencies No The paper mentions using the Stanford parser (Chen and Manning 2014) but does not provide specific version numbers for it or any other software libraries or frameworks used.
Experiment Setup Yes We create an input vocabulary to contains word appearing 5 times or more in the dataset; the output vocabulary contains the most frequent 10k words. We set all LSTM hidden states to be 256 dimensions. During training, we use a batch size of 64 and Adam (Kingma and Ba 2015) for parameter optimization, with lr=1e-3, betas=[0.9,0.999], and eps=1e-8. We apply gradient clipping of [-5,5], and a weight decay of 1e-6. At decoding time, we apply beam search with reference (Tan, Wan, and Xiao 2017) to generate summary sequences. K=10 is the beam size.