Near-Optimal-Sample Estimators for Spherical Gaussian Mixtures

Authors: Ananda Theertha Suresh, Alon Orlitsky, Jayadev Acharya, Ashkan Jafarpour

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We derive the first sample-efficient polynomial-time estimator for high-dimensional spherical Gaussian mixtures. Based on intuitive spectral reasoning, it approximates mixtures of k spherical Gaussians in d-dimensions to within ℓ1 distance ϵ using O(dk9(log2 d)/ϵ4) samples and Ok,ϵ(d3 log5 d) computation time. Conversely, we show that any estimator requires Ω(dk/ϵ2) samples, hence the algorithm s sample complexity is nearly optimal in the dimension. The implied time-complexity factor Ok,ϵ is exponential in k, but much smaller than previously known. We also construct a simple estimator for one-dimensional Gaussian mixtures that uses O(k/ϵ2) samples and O((k/ϵ)3k+1) computation time.
Researcher Affiliation Academia Jayadev Acharya MIT jayadev@mit.edu Ashkan Jafarpour, Alon Orlitsky, Ananda Theertha Suresh UC San Diego {ashkan, alon, asuresh}@ucsd.edu
Pseudocode Yes Algorithm LEARN k-SPHERE Input: n samples x(1),x(2),...,x(n) from f and ϵ. 1. Sample variance: ˆσ2 = mina b a,b [k+1] x(a) x(b) 2 2 /2d. 2. Coarse single-linkage clustering: Start with each sample as a cluster, While two clusters with squared-distance 2dˆσ2 + 23ˆσ2 dlog(n2/δ), merge them. 3. Recursive spectral-clustering: While there is a cluster C with C nϵ/5k and spectral norm of its sample covariance matrix 12k2ˆσ2 log n3/δ, Use nϵ/8k2 of the samples to find the largest eigenvector and discard these samples. Project the remaining samples on the largest eigenvector. Perform single-linkage in the projected space (as before) till the distance between clusters is > 3ˆσ log(n2k/δ) creating new clusters. 4. Exhaustive search: Let ϵg = ϵ/(16k3/2), L = 200 k4ϵ 1 log n2 δ , L = 32k G = { L,..., ϵg,0,ϵg,2ϵg,...L}. Let W = {0,ϵ/(4k),2ϵ/(4k),...1} and Σ def= {σ2 σ2 = ˆσ2(1 + iϵ/d 128dk2), L < i L }. For each cluster C find its top k 1 eigenvectors u1,...uk 1. Let Span(C) = {ˆµ(C) + k 1 i=1 giˆσui gi G}. Let Span = C C nϵ 5k Span(C). For all w i W, σ 2 Σ, ˆµi Span, add {(w 1,...,w k 1,1 k 1 i=1 w i,N(ˆµ1,σ 2),...,N(ˆµk,σ 2)} to F. 5. Run MODIFIED SCHEFFE on F and output the resulting distribution.
Open Source Code No The paper describes algorithms and theoretical results but does not include any explicit statement about releasing source code for the methodology or provide a link to a code repository.
Open Datasets No The paper is theoretical, analyzing sample complexity and algorithmic performance. It discusses 'samples' in a general, abstract sense for its theoretical arguments (e.g., 'Given n samples from the k-component mixture'), but it does not specify any concrete, publicly available datasets, provide links, DOIs, or citations to specific datasets for empirical evaluation.
Dataset Splits No The paper is theoretical and does not conduct experiments involving dataset splits. Therefore, it does not provide specific data split information for training, validation, or testing.
Hardware Specification No The paper is theoretical and focuses on algorithmic design and complexity analysis. It does not describe the execution of experiments, and thus, no hardware specifications are mentioned.
Software Dependencies No The paper is theoretical and focuses on algorithm design and mathematical proofs. It does not describe any implementation details or provide specific software dependencies with version numbers for replication.
Experiment Setup No The paper is theoretical and describes algorithms and their theoretical properties (sample and time complexity). It does not include details on an experimental setup, such as specific hyperparameter values or training configurations.