Fast Multi-stage Submodular Maximization
Authors: Kai Wei, Rishabh Iyer, Jeff Bilmes
ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We complement our theory by empirically evaluating on several real-world problems, including data subset selection on millions of speech samples where MULTGREED yields at least a thousand times speedup and superior results over the state-of-the-art selection methods. |
| Researcher Affiliation | Academia | University of Washington, Seattle, WA 98195, USA |
| Pseudocode | Yes | Algorithm 1 MULTGREED |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | All simulations are performed on the same data with size |V | = 20, 000, formed by randomly sampling from a large speech recognition training corpus (the Fisher corpus). Each sample pair has a similarity score, and the graph-based submodular functions ffac and fsat are instantiated using the corresponding similarity matrix. A set of features F sized |F| 75000 is derived from the same data to instantiate ffea. In all runs of MULTGREED, we set {βi}ℓ i=1 using the schedule βi = c + (i 1)(1 c) ℓ with c = 0.5. Performance of MULTGREED and LAZYGREED is measured by the function valuations and the wall-clock running time. ... We subselect 1300 hours of conversational English telephone data from the Switchboard , Switchboard Cellular , and Fisher corpora, which, in total, comprises 1,322,108 segments of speech (i.e., |V | = n = 1, 322, 018). ... (King et al., 2005; Lin and Bilmes, 2009; Wei et al., 2013) |
| Dataset Splits | No | The paper does not explicitly state the use of a validation set or provide specific details on train/validation/test splits. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers. |
| Experiment Setup | Yes | In all runs of MULTGREED, we set {βi}ℓ i=1 using the schedule βi = c + (i 1)(1 c) ℓ with c = 0.5. ... For fsat, the saturation ratio ξ is set as 0.25. Two stages using surrogate functions ˆfmod and ˆfsub are applied, under size constraints ℓ1 = nξ 5(1 ξ)+ξ = 0.05n, and ℓ2 = ℓ ℓ1. ... Next, for ffea, we set g to be the square root function. Two stages of surrogates ˆfmod and ˆfsub are applied. ˆfsub is defined on a randomly selected feature subset of size 37,500. We test with different combinations of size constraints ℓ1 and ℓ2 by setting ℓ1 {0, 0.25ℓ, 0.5ℓ, 0.75ℓ} with ℓ2 = ℓ ℓ1. ... We here test MULTGREED using ˆfk-NNG fac with the sparsity of k-NNG set as k = 1, 000. |