Summarization Programs: Interpretable Abstractive Summarization with Neural Modular Trees
Authors: Swarnadeep Saha, Shiyue Zhang, Peter Hase, Mohit Bansal
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that SP-SEARCH effectively represents the generative process behind human summaries using modules that are typically faithful to their intended behavior. We also conduct a simulation study to show that Summarization Programs improve the interpretability of summarization models by allowing humans to better simulate model reasoning. ...We experiment with two English single-document summarization datasets, CNN/Daily Mail (Hermann et al., 2015) and XSum (Narayan et al., 2018). |
| Researcher Affiliation | Academia | Swarnadeep Saha, Shiyue Zhang, Peter Hase, Mohit Bansal Department of Computer Science University of North Carolina at Chapel Hill {swarna,shiyue,peter,mbansal}@cs.unc.edu |
| Pseudocode | Yes | SP-SEARCH is outlined in Algorithm 1. ... Algorithm 1: SP-SEARCH Algorithm |
| Open Source Code | Yes | Supporting code available at https://github.com/swarnaHub/SummarizationPrograms. |
| Open Datasets | Yes | We experiment with two English single-document summarization datasets, CNN/Daily Mail (Hermann et al., 2015) and XSum (Narayan et al., 2018). ... The CNN/Daily Mail and XSum datasets are also publicly available at https://huggingface.co/datasets/cnn_dailymail and https://huggingface.co/datasets/xsum respectively. |
| Dataset Splits | Yes | In particular, we use SP-SEARCH to identify Summarization Programs for all training samples in the CNN/Daily Mail dataset (Hermann et al., 2015). ... We conduct experiments with 1000 random validation samples. ... We compare our SP generation models (Joint and Extract-and-Build) with the following baselines and oracles on the CNN/Daily Mail test set. |
| Hardware Specification | Yes | average search time per sample (on a single RTX 2080 Ti GPU) |
| Software Dependencies | No | We build our models on top of the Hugging Face transformers library (Wolf et al., 2020). The paper does not provide specific version numbers for software dependencies. |
| Experiment Setup | Yes | All models are trained for 40000 steps with a batch size of 16, learning rate of 3 10 5 and warmup steps of 500. We set the maximum input length to 512 and maximum generation length to 100. During inference, we generate up to Top-10 Summarization Programs with beam search and output the first well-formed program. We also set the minimum generation length to 10 to prevent the model from generating too short sequences and repetition penalty to 2. |