Lazier Than Lazy Greedy

Authors: Baharan Mirzasoleiman, Ashwinkumar Badanidiyuru, Amin Karbasi, Jan Vondrak, Andreas Krause

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically demonstrate the effectiveness of our algorithm on submodular functions arising in data summarization, including training large-scale kernel methods, exemplar-based clustering, and sensor placement. We observe that STOCHASTIC-GREEDY practically achieves the same utility value as lazy greedy but runs much faster.
Researcher Affiliation Collaboration Baharan Mirzasoleiman ETH Zurich baharanm@inf.ethz.ch Ashwinkumar Badanidiyuru Google Research Mountain View ashwinkumarbv@google.com Amin Karbasi Yale University amin.karbasi@yale.edu Jan Vondrak IBM Almaden jvondrak@us.ibm.com Andreas Krause ETH Zurich krausea@ethz.ch
Pseudocode Yes Algorithm 1 STOCHASTIC-GREEDY Input: f : 2V R+, k {1, . . . , n}. Output: A set A V satisfying |A| k. 1: A . 2: for (i 1; i k; i i + 1) do 3: R a random subset obtained by sampling s random elements from V \ A. 4: ai argmaxa R (a|A). 5: A A {ai} 6: return A.
Open Source Code No The paper does not provide a link to its source code or explicitly state that the code is publicly available.
Open Datasets Yes We used the Parkinsons Telemonitoring dataset (Tsanas et al. 2010) consisting of 5,875 biomedical voice measurements with 22 attributes from people with early-stage Parkinsons disease. We used a set of 10,000 Tiny Images (Torralba, Fergus, and Freeman 2008) where each 32 32 RGB image was represented by a 3,072 dimensional vector. In our experiments we used the 12,527 node distribution network provided as part of the Battle of Water Sensor Networks (BWSN) challenge (Ostfeld et al. 2008).
Dataset Splits No The paper mentions several datasets (Parkinsons Telemonitoring, Tiny Images, Water Network) but does not provide specific details on how these datasets were split into training, validation, or test sets.
Hardware Specification No In order to compare the computational cost of different methods independently of the concrete implementation and platform, in our experiments we measure the computational cost in terms of the number of function evaluations used. No specific hardware specifications (CPU, GPU, memory, etc.) are mentioned.
Software Dependencies No The paper does not provide specific details about the software dependencies, such as programming languages, libraries, or solvers with version numbers, used for the experiments.
Experiment Setup Yes In our experiment we chose a Gaussian kernel with h = 0.75 and σ = 1. We normalized the vectors to zero mean and unit norm. In our experiment we chose d(x, x ) = ||x x ||2 for the dissimilarity measure. We subtracted from each vector the mean value, normalized it to unit norm, and used the origin as the auxiliary exemplar.