Near-Optimal-Sample Estimators for Spherical Gaussian Mixtures
Authors: Ananda Theertha Suresh, Alon Orlitsky, Jayadev Acharya, Ashkan Jafarpour
NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We derive the first sample-efficient polynomial-time estimator for high-dimensional spherical Gaussian mixtures. Based on intuitive spectral reasoning, it approximates mixtures of k spherical Gaussians in d-dimensions to within ℓ1 distance ϵ using O(dk9(log2 d)/ϵ4) samples and Ok,ϵ(d3 log5 d) computation time. Conversely, we show that any estimator requires Ω(dk/ϵ2) samples, hence the algorithm s sample complexity is nearly optimal in the dimension. The implied time-complexity factor Ok,ϵ is exponential in k, but much smaller than previously known. We also construct a simple estimator for one-dimensional Gaussian mixtures that uses O(k/ϵ2) samples and O((k/ϵ)3k+1) computation time. |
| Researcher Affiliation | Academia | Jayadev Acharya MIT jayadev@mit.edu Ashkan Jafarpour, Alon Orlitsky, Ananda Theertha Suresh UC San Diego {ashkan, alon, asuresh}@ucsd.edu |
| Pseudocode | Yes | Algorithm LEARN k-SPHERE Input: n samples x(1),x(2),...,x(n) from f and ϵ. 1. Sample variance: ˆσ2 = mina b a,b [k+1] x(a) x(b) 2 2 /2d. 2. Coarse single-linkage clustering: Start with each sample as a cluster, While two clusters with squared-distance 2dˆσ2 + 23ˆσ2 dlog(n2/δ), merge them. 3. Recursive spectral-clustering: While there is a cluster C with C nϵ/5k and spectral norm of its sample covariance matrix 12k2ˆσ2 log n3/δ, Use nϵ/8k2 of the samples to find the largest eigenvector and discard these samples. Project the remaining samples on the largest eigenvector. Perform single-linkage in the projected space (as before) till the distance between clusters is > 3ˆσ log(n2k/δ) creating new clusters. 4. Exhaustive search: Let ϵg = ϵ/(16k3/2), L = 200 k4ϵ 1 log n2 δ , L = 32k G = { L,..., ϵg,0,ϵg,2ϵg,...L}. Let W = {0,ϵ/(4k),2ϵ/(4k),...1} and Σ def= {σ2 σ2 = ˆσ2(1 + iϵ/d 128dk2), L < i L }. For each cluster C find its top k 1 eigenvectors u1,...uk 1. Let Span(C) = {ˆµ(C) + k 1 i=1 giˆσui gi G}. Let Span = C C nϵ 5k Span(C). For all w i W, σ 2 Σ, ˆµi Span, add {(w 1,...,w k 1,1 k 1 i=1 w i,N(ˆµ1,σ 2),...,N(ˆµk,σ 2)} to F. 5. Run MODIFIED SCHEFFE on F and output the resulting distribution. |
| Open Source Code | No | The paper describes algorithms and theoretical results but does not include any explicit statement about releasing source code for the methodology or provide a link to a code repository. |
| Open Datasets | No | The paper is theoretical, analyzing sample complexity and algorithmic performance. It discusses 'samples' in a general, abstract sense for its theoretical arguments (e.g., 'Given n samples from the k-component mixture'), but it does not specify any concrete, publicly available datasets, provide links, DOIs, or citations to specific datasets for empirical evaluation. |
| Dataset Splits | No | The paper is theoretical and does not conduct experiments involving dataset splits. Therefore, it does not provide specific data split information for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and focuses on algorithmic design and complexity analysis. It does not describe the execution of experiments, and thus, no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and focuses on algorithm design and mathematical proofs. It does not describe any implementation details or provide specific software dependencies with version numbers for replication. |
| Experiment Setup | No | The paper is theoretical and describes algorithms and their theoretical properties (sample and time complexity). It does not include details on an experimental setup, such as specific hyperparameter values or training configurations. |