reproducibilityindex.ai

Fast Distributed Submodular Cover: Public-Private Data Summarization

Authors: Baharan Mirzasoleiman, Morteza Zadimoghaddam, Amin Karbasi

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we evaluate the performance of FASTCOVER on the three applications that we described in Section 3: personalized movie recommendation, personalized location recommendation, and dominating set on social networks. To validate our theoretical results and demonstrate the effectiveness of FASTCOVER, we compare the performance of our algorithm against DISCOVER and the centralized greedy algorithm (when possible).
Researcher Affiliation	Collaboration	Baharan Mirzasoleiman Morteza Zadimoghaddam Amin Karbasi ETH Zurich Google Research Yale University
Pseudocode	Yes	Algorithm 1: FASTCOVER and Algorithm 2: THRESHOLDSAMPLE
Open Source Code	No	The paper states that FASTCOVER was implemented with Spark but does not provide any links or explicit statements about the public availability of the source code.
Open Datasets	No	The paper describes dataset usage for specific tasks (e.g., location recommendation, movie recommendation, dominating set) and mentions '20% of her data private' for location data, which is a data partition for privacy, not a training split in the traditional supervised learning sense. It does not provide information about training splits for machine learning models.
Dataset Splits	No	The paper does not explicitly mention a validation set or validation split for its experiments.
Hardware Specification	Yes	Our experimental infrastructure was a cluster of 16 quad-core machines with 20GB of memory each, running Spark.
Software Dependencies	No	The paper mentions 'running Spark' but does not specify a version number for Spark or any other ancillary software components.
Experiment Setup	Yes	We set the number of reducers to m = 60. To run FASTCOVER on Spark, we ﬁrst distributed the data uniformly at random to the machines, and performed a map/reduce task to ﬁnd the highest marginal gain τ = M. Each machine then carries out a set of map/reduce tasks in sequence, where each map/reduce stage ﬁlters out elements with a speciﬁc threshold τ on the whole dataset. We then tune the parameter τ, communicate back the results to the machines and perform another round of map/reduce calculation. The parameter αu is set randomly for each user u. ... The parameter αu is set to 0.7 for all users. We scaled down the number of iterations by a factor of 0.01, so that the corresponding bars can be shown in the same ﬁgures.