Summarization Programs: Interpretable Abstractive Summarization with Neural Modular Trees

Authors: Swarnadeep Saha, Shiyue Zhang, Peter Hase, Mohit Bansal

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that SP-SEARCH effectively represents the generative process behind human summaries using modules that are typically faithful to their intended behavior. We also conduct a simulation study to show that Summarization Programs improve the interpretability of summarization models by allowing humans to better simulate model reasoning. ...We experiment with two English single-document summarization datasets, CNN/Daily Mail (Hermann et al., 2015) and XSum (Narayan et al., 2018).
Researcher Affiliation Academia Swarnadeep Saha, Shiyue Zhang, Peter Hase, Mohit Bansal Department of Computer Science University of North Carolina at Chapel Hill {swarna,shiyue,peter,mbansal}@cs.unc.edu
Pseudocode Yes SP-SEARCH is outlined in Algorithm 1. ... Algorithm 1: SP-SEARCH Algorithm
Open Source Code Yes Supporting code available at https://github.com/swarnaHub/SummarizationPrograms.
Open Datasets Yes We experiment with two English single-document summarization datasets, CNN/Daily Mail (Hermann et al., 2015) and XSum (Narayan et al., 2018). ... The CNN/Daily Mail and XSum datasets are also publicly available at https://huggingface.co/datasets/cnn_dailymail and https://huggingface.co/datasets/xsum respectively.
Dataset Splits Yes In particular, we use SP-SEARCH to identify Summarization Programs for all training samples in the CNN/Daily Mail dataset (Hermann et al., 2015). ... We conduct experiments with 1000 random validation samples. ... We compare our SP generation models (Joint and Extract-and-Build) with the following baselines and oracles on the CNN/Daily Mail test set.
Hardware Specification Yes average search time per sample (on a single RTX 2080 Ti GPU)
Software Dependencies No We build our models on top of the Hugging Face transformers library (Wolf et al., 2020). The paper does not provide specific version numbers for software dependencies.
Experiment Setup Yes All models are trained for 40000 steps with a batch size of 16, learning rate of 3 10 5 and warmup steps of 500. We set the maximum input length to 512 and maximum generation length to 100. During inference, we generate up to Top-10 Summarization Programs with beam search and output the first well-formed program. We also set the minimum generation length to 10 to prevent the model from generating too short sequences and repetition penalty to 2.