Distributed Submodular Cover: Succinctly Summarizing Massive Data

Authors: Baharan Mirzasoleiman, Amin Karbasi, Ashwinkumar Badanidiyuru, Andreas Krause

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our extensive experiments, we demonstrate the effectiveness of our approach on several applications, including active set selection, exemplar based clustering, and vertex cover on tens of millions of data points using Spark.
Researcher Affiliation Collaboration Baharan Mirzasoleiman ETH Zurich Amin Karbasi Yale University Ashwinkumar Badanidiyuru Google Andreas Krause ETH Zurich
Pseudocode Yes Algorithm 1 Approximate Submodular Cover; Algorithm 2 Approximate OPTCARD; Algorithm 3 DISCOVER; Algorithm 4 Greedy Distributed Submodular Maximization (GREEDI)
Open Source Code No The paper does not provide a link or explicit statement about the availability of its source code.
Open Datasets Yes We perform our experiments on a set of 10,000 Tiny Images [28]...We use the Parkinsons Telemonitoring dataset [29]...As our large scale experiment, we applied DISCOVER to the Friendster network... [30]
Dataset Splits No The paper describes running experiments on datasets and evaluating coverage percentages, but it does not specify train, validation, or test splits for data partitioning or model training in the conventional sense.
Hardware Specification Yes Our experimental infrastructure was a cluster of 8 quad-core machines with 32GB of memory each, running Spark.
Software Dependencies No The paper mentions 'running Spark' but does not specify the version number of Spark or any other software dependencies.
Experiment Setup Yes We set the number of reducers to m = 64...We first distributed the data uniformly at random to the machines, where each machine received 1,025,130 vertices ( 12.5GB RAM). Then we start with ℓ= 1, perform a map/reduce task to extract one element...We examine the performance of DISCOVER by obtaining covers for 50%, 30%, 20% and 10% of the whole graph...α = 1.