SumREN: Summarizing Reported Speech about Events in News

Authors: Revanth Gangi Reddy, Heba Elfardy, Hou Pong Chan, Kevin Small, Heng Ji

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To this end, we create a new multi-document summarization benchmark, SUMREN, comprising 745 summaries of reported statements from various public figures obtained from 633 news articles discussing 132 events. We propose an automatic silver-training data generation approach for our task, which helps smaller models like BART achieve GPT-3 level performance on this task. Finally, we introduce a pipeline-based framework for summarizing reported speech, which we empirically show to generate summaries that are more abstractive and factual than baseline query-focused summarization approaches.
Researcher Affiliation Collaboration Revanth Gangi Reddy1*, Heba Elfardy2, Hou Pong Chan3, Kevin Small2, Heng Ji2 1University of Illinois Urbana-Champaign 2Amazon Alexa 3University of Macau revanth3@illinois.edu, {helfardy,smakevin,jihj}@amazon.com, hpchan@um.edu.mo
Pseudocode No The paper describes a process with steps (e.g., for summary generation and benchmark construction) and provides equations, but it does not include a formally labeled pseudocode block or algorithm.
Open Source Code Yes 1Code and data at: https://github.com/amazon-science/sumren
Open Datasets Yes We create a new multi-document summarization benchmark, SUMREN, comprising 745 summaries of reported statements... However, current news summarization datasets such as CNN-DM (Hermann et al. 2015), Multi-News (Fabbri et al. 2019), and Timeline100 (Li et al. 2021) largely disregard summarizing these reported statements. To bridge this gap, we introduce the new task of Summarizing Reported speech about Events in News and create a new benchmark, SUMREN, for this task.
Dataset Splits Yes Our benchmark has 745 examples in total, with a train/dev/test split of 235/104/406 respectively.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processors, or memory used for running the experiments. It mentions using a BERT encoder but not the specific hardware it ran on.
Software Dependencies No The paper mentions various models and frameworks (e.g., BART, GPT-3, BERT, LORA, Fact CC), but it does not specify software versions for programming languages, libraries, or dependencies required for reproduction (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup Yes Finally, the model is trained with a multi-task learning objective by using a joint loss that performs a weighted sum of the classification binary cross entropy (BCE) and the sequence labeling head Cross Entropy (CE) losses. L = α BCE(ycls, ˆycls) + β CE(Y sp, ˆY sp) where ycls and ˆycls correspond to the predicted and ground-truth classification label respectively, Y sp and ˆY sp denote the predcited and ground-truth token labels respectively, α and β are tunable hyper-parameters.5 In our experiments, we set α to 1 and β to 0.4.