reproducibilityindex.ai

Submodular Span, with Applications to Conditional Data Summarization

Authors: Lilly Kumari, Jeff Bilmes12344-12352

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide empirical and qualitative results on three real-world tasks: conditional multi-document summarization on the DUC 2005-2007 datasets, conditional video summarization on the UT-Egocentric dataset, and conditional image corpus summarization on the Image Net dataset. We use deep neural networks, speciﬁcally a BERT model for text, Alex Net for video frames, and Bi-directional Generative Adversarial Networks (Bi GAN) for Image Net images to help instantiate the submodular functions. The result is a minimally supervised form of conditional summarization that matches or improves over the previous state-of-the-art.
Researcher Affiliation	Academia	Lilly Kumari, Jeff Bilmes Department of Electrical & Computer Engineering, University of Washington, Seattle {lkumari, bilmes}@uw.edu
Pseudocode	No	The paper describes algorithmic steps and refers to standard algorithms like the greedy algorithm and MMin, but does not present any pseudocode blocks or explicitly labeled algorithms.
Open Source Code	No	The paper does not explicitly state that source code for the methodology is available, nor does it provide a link to a code repository.
Open Datasets	Yes	We use DUC 2005-2007 datasets which are the benchmark datasets for query-focused MDS, made available by the Document Understanding Conference 1. [footnote: 1https://duc.nist.gov] ... Image Net-1k (Deng et al. 2009) is a large scale image database which contains nearly 1.28 million training images and 50,000 validation images.
Dataset Splits	Yes	We use the English uncased variant of the BERT-base model (Devlin et al. 2018) and ﬁne-tune it for the Rouge-2 recall score prediction task using two years of DUC 2005-2007 as the training set. For example, we ﬁne-tune the network on the DUC 2005-2006 datasets in order to extract ﬁxed-size sentence representations for DUC 2007 (which is the test set in this example). We do not use any oracle summarization labels for the test set. ... For DUC-2005, we use DUC-2006 to tune the hyperparameters which include {l, σ, ϵ, r}. Similarly, for DUC-2006 and DUC-2007, we use DUC-2005 as the development set.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies	Yes	We use the ROUGE toolkit (Lin 2004)2 which assesses the summary quality by counting the overlapping units such as n-grams, word sequences, and word-pairs between the candidate summary and the reference summaries. We report recall and F-measure corresponding to Rouge-1, Rouge-2 , and Rouge-SU4. [footnote: 2ROUGE version 1.5.5 used with option -n 2 -x -m -2 4 -u -c 95 -r 1000 -f A -p 0.5 -t 0 -d -l 250]
Experiment Setup	Yes	For DUC-2005, we use DUC-2006 to tune the hyperparameters which include {l, σ, ϵ, r}. Similarly, for DUC-2006 and DUC-2007, we use DUC-2005 as the development set. ... For Video-1, we use Video-3 to tune the hyperparameters which include {k1, k2}; k1 and k2 are the cardinality constraints for optimizing stage one and stage two respectively. For Video 2-4, we use Video-1 as the development set. ... In all experiments, k is set to 1000.