Fast Distributed Submodular Cover: Public-Private Data Summarization
Authors: Baharan Mirzasoleiman, Morteza Zadimoghaddam, Amin Karbasi
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we evaluate the performance of FASTCOVER on the three applications that we described in Section 3: personalized movie recommendation, personalized location recommendation, and dominating set on social networks. To validate our theoretical results and demonstrate the effectiveness of FASTCOVER, we compare the performance of our algorithm against DISCOVER and the centralized greedy algorithm (when possible). |
| Researcher Affiliation | Collaboration | Baharan Mirzasoleiman Morteza Zadimoghaddam Amin Karbasi ETH Zurich Google Research Yale University |
| Pseudocode | Yes | Algorithm 1: FASTCOVER and Algorithm 2: THRESHOLDSAMPLE |
| Open Source Code | No | The paper states that FASTCOVER was implemented with Spark but does not provide any links or explicit statements about the public availability of the source code. |
| Open Datasets | No | The paper describes dataset usage for specific tasks (e.g., location recommendation, movie recommendation, dominating set) and mentions '20% of her data private' for location data, which is a data partition for privacy, not a training split in the traditional supervised learning sense. It does not provide information about training splits for machine learning models. |
| Dataset Splits | No | The paper does not explicitly mention a validation set or validation split for its experiments. |
| Hardware Specification | Yes | Our experimental infrastructure was a cluster of 16 quad-core machines with 20GB of memory each, running Spark. |
| Software Dependencies | No | The paper mentions 'running Spark' but does not specify a version number for Spark or any other ancillary software components. |
| Experiment Setup | Yes | We set the number of reducers to m = 60. To run FASTCOVER on Spark, we first distributed the data uniformly at random to the machines, and performed a map/reduce task to find the highest marginal gain τ = M. Each machine then carries out a set of map/reduce tasks in sequence, where each map/reduce stage filters out elements with a specific threshold τ on the whole dataset. We then tune the parameter τ, communicate back the results to the machines and perform another round of map/reduce calculation. The parameter αu is set randomly for each user u. ... The parameter αu is set to 0.7 for all users. We scaled down the number of iterations by a factor of 0.01, so that the corresponding bars can be shown in the same figures. |