Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Lazier Than Lazy Greedy
Authors: Baharan Mirzasoleiman, Ashwinkumar Badanidiyuru, Amin Karbasi, Jan Vondrak, Andreas Krause
AAAI 2015 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate the effectiveness of our algorithm on submodular functions arising in data summarization, including training large-scale kernel methods, exemplar-based clustering, and sensor placement. We observe that STOCHASTIC-GREEDY practically achieves the same utility value as lazy greedy but runs much faster. |
| Researcher Affiliation | Collaboration | Baharan Mirzasoleiman ETH Zurich EMAIL Ashwinkumar Badanidiyuru Google Research Mountain View EMAIL Amin Karbasi Yale University EMAIL Jan Vondrak IBM Almaden EMAIL Andreas Krause ETH Zurich EMAIL |
| Pseudocode | Yes | Algorithm 1 STOCHASTIC-GREEDY Input: f : 2V R+, k {1, . . . , n}. Output: A set A V satisfying |A| k. 1: A . 2: for (i 1; i k; i i + 1) do 3: R a random subset obtained by sampling s random elements from V \ A. 4: ai argmaxa R (a|A). 5: A A {ai} 6: return A. |
| Open Source Code | No | The paper does not provide a link to its source code or explicitly state that the code is publicly available. |
| Open Datasets | Yes | We used the Parkinsons Telemonitoring dataset (Tsanas et al. 2010) consisting of 5,875 biomedical voice measurements with 22 attributes from people with early-stage Parkinsons disease. We used a set of 10,000 Tiny Images (Torralba, Fergus, and Freeman 2008) where each 32 32 RGB image was represented by a 3,072 dimensional vector. In our experiments we used the 12,527 node distribution network provided as part of the Battle of Water Sensor Networks (BWSN) challenge (Ostfeld et al. 2008). |
| Dataset Splits | No | The paper mentions several datasets (Parkinsons Telemonitoring, Tiny Images, Water Network) but does not provide specific details on how these datasets were split into training, validation, or test sets. |
| Hardware Specification | No | In order to compare the computational cost of different methods independently of the concrete implementation and platform, in our experiments we measure the computational cost in terms of the number of function evaluations used. No specific hardware specifications (CPU, GPU, memory, etc.) are mentioned. |
| Software Dependencies | No | The paper does not provide specific details about the software dependencies, such as programming languages, libraries, or solvers with version numbers, used for the experiments. |
| Experiment Setup | Yes | In our experiment we chose a Gaussian kernel with h = 0.75 and σ = 1. We normalized the vectors to zero mean and unit norm. In our experiment we chose d(x, x ) = ||x x ||2 for the dissimilarity measure. We subtracted from each vector the mean value, normalized it to unit norm, and used the origin as the auxiliary exemplar. |