Fast Distributed Submodular Cover: Public-Private Data Summarization

Authors: Baharan Mirzasoleiman, Morteza Zadimoghaddam, Amin Karbasi

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we evaluate the performance of FASTCOVER on the three applications that we described in Section 3: personalized movie recommendation, personalized location recommendation, and dominating set on social networks. To validate our theoretical results and demonstrate the effectiveness of FASTCOVER, we compare the performance of our algorithm against DISCOVER and the centralized greedy algorithm (when possible).
Researcher Affiliation Collaboration Baharan Mirzasoleiman Morteza Zadimoghaddam Amin Karbasi ETH Zurich Google Research Yale University
Pseudocode Yes Algorithm 1: FASTCOVER and Algorithm 2: THRESHOLDSAMPLE
Open Source Code No The paper states that FASTCOVER was implemented with Spark but does not provide any links or explicit statements about the public availability of the source code.
Open Datasets No The paper describes dataset usage for specific tasks (e.g., location recommendation, movie recommendation, dominating set) and mentions '20% of her data private' for location data, which is a data partition for privacy, not a training split in the traditional supervised learning sense. It does not provide information about training splits for machine learning models.
Dataset Splits No The paper does not explicitly mention a validation set or validation split for its experiments.
Hardware Specification Yes Our experimental infrastructure was a cluster of 16 quad-core machines with 20GB of memory each, running Spark.
Software Dependencies No The paper mentions 'running Spark' but does not specify a version number for Spark or any other ancillary software components.
Experiment Setup Yes We set the number of reducers to m = 60. To run FASTCOVER on Spark, we first distributed the data uniformly at random to the machines, and performed a map/reduce task to find the highest marginal gain τ = M. Each machine then carries out a set of map/reduce tasks in sequence, where each map/reduce stage filters out elements with a specific threshold τ on the whole dataset. We then tune the parameter τ, communicate back the results to the machines and perform another round of map/reduce calculation. The parameter αu is set randomly for each user u. ... The parameter αu is set to 0.7 for all users. We scaled down the number of iterations by a factor of 0.01, so that the corresponding bars can be shown in the same figures.