Do Less, Get More: Streaming Submodular Maximization with Subsampling
Authors: Moran Feldman, Amin Karbasi, Ehsan Kazemi
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we develop the first one-pass streaming algorithm for submodular maximization that does not evaluate the entire stream even once. By carefully subsampling each element of the data stream, our algorithm enjoys the tightest approximation guarantees in various settings while having the smallest memory footprint and requiring the lowest number of function evaluations. To showcase its practicality, we empirically evaluated the performance of our algorithm on a video summarization application and observed that it outperforms the state-of-the-art algorithm by up to fifty-fold while maintaining practically the same utility. We also evaluated the scalability of our algorithm on a large dataset of Uber pick up locations. |
| Researcher Affiliation | Academia | Moran Feldman Open University of Israel moranfe@openu.ac.il Amin Karbasi Yale University amin.karbasi@yale.edu Ehsan Kazemi Yale University ehsan.kazemi@yale.edu |
| Pseudocode | Yes | Algorithm 1: EXCHANGE-CANDIDATE (S, u) and Algorithm 2: SAMPLE-STREAMING |
| Open Source Code | No | The paper provides links to third-party codebases (SEQDPP and LOCAL-SEARCH) that are compared against, but not to the source code for the methodology presented in this paper. |
| Open Datasets | Yes | For our experiments, we use the Open Video Project (OVP) and the You Tube datasets, which have 50 and 39 videos, respectively [10]. In this section, given a dataset of 504,247 Uber pick ups in Manhattan, New York in April 2014 [41]... [41] Uber Dataset. Uber pickups in new york city, 2014. URL https://www.kaggle.com/fivethirtyeight/uber-pickups-in-new-york-city. |
| Dataset Splits | No | The paper describes partitioning video sequences into segments and using Uber pickup locations for experiments, but it does not specify explicit training, validation, or test dataset splits in terms of percentages or counts. |
| Hardware Specification | Yes | In these experiments, we used a machine powered by Intel i5, 3.2 GHz processor and 16 GB of RAM. |
| Software Dependencies | No | The paper mentions specific algorithms (SEQDPP, LOCAL-SEARCH) but does not provide specific version numbers for any software dependencies or libraries used in their implementation or experimental setup. |
| Experiment Setup | Yes | We follow the experimental setup of [18] for extracting frames from videos, finding a linear kernel matrix L and evaluating the quality of produced summaries based on their F-score. ... In the first experiment, we set the radius of regions to r = 1.5km. ... This time, for = 5, it took only 35 seconds (and 296,023 oracle calls) for our algorithm to find a summary of size k = 54. Additionally, for = 10 and = 20... |