reproducibilityindex.ai

SumREN: Summarizing Reported Speech about Events in News

Authors: Revanth Gangi Reddy, Heba Elfardy, Hou Pong Chan, Kevin Small, Heng Ji

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To this end, we create a new multi-document summarization benchmark, SUMREN, comprising 745 summaries of reported statements from various public figures obtained from 633 news articles discussing 132 events. We propose an automatic silver-training data generation approach for our task, which helps smaller models like BART achieve GPT-3 level performance on this task. Finally, we introduce a pipeline-based framework for summarizing reported speech, which we empirically show to generate summaries that are more abstractive and factual than baseline query-focused summarization approaches.
Researcher Affiliation	Collaboration	Revanth Gangi Reddy1*, Heba Elfardy2, Hou Pong Chan3, Kevin Small2, Heng Ji2 1University of Illinois Urbana-Champaign 2Amazon Alexa 3University of Macau revanth3@illinois.edu, {helfardy,smakevin,jihj}@amazon.com, hpchan@um.edu.mo
Pseudocode	No	The paper describes a process with steps (e.g., for summary generation and benchmark construction) and provides equations, but it does not include a formally labeled pseudocode block or algorithm.
Open Source Code	Yes	1Code and data at: https://github.com/amazon-science/sumren
Open Datasets	Yes	We create a new multi-document summarization benchmark, SUMREN, comprising 745 summaries of reported statements... However, current news summarization datasets such as CNN-DM (Hermann et al. 2015), Multi-News (Fabbri et al. 2019), and Timeline100 (Li et al. 2021) largely disregard summarizing these reported statements. To bridge this gap, we introduce the new task of Summarizing Reported speech about Events in News and create a new benchmark, SUMREN, for this task.
Dataset Splits	Yes	Our benchmark has 745 examples in total, with a train/dev/test split of 235/104/406 respectively.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processors, or memory used for running the experiments. It mentions using a BERT encoder but not the specific hardware it ran on.
Software Dependencies	No	The paper mentions various models and frameworks (e.g., BART, GPT-3, BERT, LORA, Fact CC), but it does not specify software versions for programming languages, libraries, or dependencies required for reproduction (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup	Yes	Finally, the model is trained with a multi-task learning objective by using a joint loss that performs a weighted sum of the classification binary cross entropy (BCE) and the sequence labeling head Cross Entropy (CE) losses. L = α BCE(ycls, ˆycls) + β CE(Y sp, ˆY sp) where ycls and ˆycls correspond to the predicted and ground-truth classification label respectively, Y sp and ˆY sp denote the predcited and ground-truth token labels respectively, α and β are tunable hyper-parameters.5 In our experiments, we set α to 1 and β to 0.4.