reproducibilityindex.ai

Do Less, Get More: Streaming Submodular Maximization with Subsampling

Authors: Moran Feldman, Amin Karbasi, Ehsan Kazemi

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we develop the ﬁrst one-pass streaming algorithm for submodular maximization that does not evaluate the entire stream even once. By carefully subsampling each element of the data stream, our algorithm enjoys the tightest approximation guarantees in various settings while having the smallest memory footprint and requiring the lowest number of function evaluations. To showcase its practicality, we empirically evaluated the performance of our algorithm on a video summarization application and observed that it outperforms the state-of-the-art algorithm by up to ﬁfty-fold while maintaining practically the same utility. We also evaluated the scalability of our algorithm on a large dataset of Uber pick up locations.
Researcher Affiliation	Academia	Moran Feldman Open University of Israel moranfe@openu.ac.il Amin Karbasi Yale University amin.karbasi@yale.edu Ehsan Kazemi Yale University ehsan.kazemi@yale.edu
Pseudocode	Yes	Algorithm 1: EXCHANGE-CANDIDATE (S, u) and Algorithm 2: SAMPLE-STREAMING
Open Source Code	No	The paper provides links to third-party codebases (SEQDPP and LOCAL-SEARCH) that are compared against, but not to the source code for the methodology presented in this paper.
Open Datasets	Yes	For our experiments, we use the Open Video Project (OVP) and the You Tube datasets, which have 50 and 39 videos, respectively [10]. In this section, given a dataset of 504,247 Uber pick ups in Manhattan, New York in April 2014 [41]... [41] Uber Dataset. Uber pickups in new york city, 2014. URL https://www.kaggle.com/fivethirtyeight/uber-pickups-in-new-york-city.
Dataset Splits	No	The paper describes partitioning video sequences into segments and using Uber pickup locations for experiments, but it does not specify explicit training, validation, or test dataset splits in terms of percentages or counts.
Hardware Specification	Yes	In these experiments, we used a machine powered by Intel i5, 3.2 GHz processor and 16 GB of RAM.
Software Dependencies	No	The paper mentions specific algorithms (SEQDPP, LOCAL-SEARCH) but does not provide specific version numbers for any software dependencies or libraries used in their implementation or experimental setup.
Experiment Setup	Yes	We follow the experimental setup of [18] for extracting frames from videos, ﬁnding a linear kernel matrix L and evaluating the quality of produced summaries based on their F-score. ... In the ﬁrst experiment, we set the radius of regions to r = 1.5km. ... This time, for = 5, it took only 35 seconds (and 296,023 oracle calls) for our algorithm to find a summary of size k = 54. Additionally, for = 10 and = 20...