An Efficient Streaming Algorithm for the Submodular Cover Problem

Authors: Ashkan Norouzi-Fard, Abbas Bazzi, Ilija Bogunovic, Marwa El Halabi, Ya-Ping Hsieh, Volkan Cevher

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We design the first Efficient bicriteria Submodular Cover Streaming (ESC-Streaming) algorithm for this problem, and provide theoretical guarantees for its performance supported by numerical evidence. In our numerical experiments, we evaluate the performance of ESC-Streaming on active set selection and large-scale graph cover problems. We evaluate the performance of ESC-Streaming on real-world data sets with two applications: active set selection and graph set cover problems, described in section 5.
Researcher Affiliation Academia Ashkan Norouzi-Fard ashkan.norouzifard@epfl.ch, Abbas Bazzi abbas.bazzi@epfl.ch, Marwa El Halabi marwa.elhalabi@epfl.ch, Ilija Bogunovic ilija.bogunovic@epfl.ch, Ya-Ping Hsieh ya-ping.hsieh@epfl.ch, Volkan Cevher volkan.cevher@epfl.ch. Theory of Computation Laboratory 2 (THL2), EPFL. Laboratory for Information and Inference Systems (LIONS), EPFL
Pseudocode Yes Algorithm 1 ESC-Streaming Algorithm Picking representative set. Algorithm 2 ESC-Streaming Algorithm Responding to the queries
Open Source Code No The paper does not explicitly state that source code for the methodology is openly available or provide a link to a repository.
Open Datasets Yes We evaluate the performance of ESC-Streaming on real-world data sets with two applications: active set selection and graph set cover problems, described in section 5. For active set selection, we choose a dataset having a size that permits the comparison with the offline greedy algorithm. The dataset consists of 7k small organic molecules, each represented by a 276 dimensional vector. For graph cover, we run ESC-Streaming on a large graph of 787 million nodes and 47.6 billion edges. We apply it to the "uk-2014" graph, a large snapshot of the .uk domain taken at the end of 2014 [5, 4, 3]. We include another instance of the Dominating set cover problem on a smaller graph Friendster", an online gaming network [29].
Dataset Splits No The paper does not explicitly specify exact percentages or sample counts for training/validation/test splits, nor does it provide citations to predefined splits for its specific experiments.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory sizes) used for running its experiments.
Software Dependencies No The paper does not provide specific software dependency details with version numbers.
Experiment Setup Yes For active set selection, we apply ESC-Streaming on the log-det function defined in Section 5.1 where we use the Gaussian kernel Kij = exp( kxi xjk2 / 2h2 ), and we set the hyperparameters as in [24]: σ = 1, h = 724. For the Dominating Set problem, we set M = 520 MB, = 2 and Q = 0.7|V |. Similarly, for the Vertex Cover problem, we set M = 320 MB, = 2 and Q = 0.8|E|.