Learning Mixture of Gaussians with Streaming Data

Authors: Aditi Raghunathan, Prateek Jain, Ravishankar Krishnawamy

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this paper, we study the problem of learning a mixture of Gaussians with streaming data: given a stream of N points in d dimensions generated by an unknown mixture of k spherical Gaussians, the goal is to estimate the model parameters using a single pass over the data stream. We analyze a streaming version of the popular Lloyd s heuristic and show that the algorithm estimates all the unknown centers of the component Gaussians accurately if they are sufficiently separated. Our main contribution is the first bias-variance bound for the problem of learning Gaussian mixtures with streaming data.
Researcher Affiliation Collaboration Aditi Raghunathan Stanford University aditir@stanford.edu Prateek Jain Microsoft Research, India prajain@microsoft.com Ravishankar Krishnaswamy Microsoft Research, India rakri@microsoft.com
Pseudocode Yes Algorithm 1 Init Alg(N0) ... Algorithm 2 Stream Kmeans(N, N0) ... Algorithm 3 Stream Soft Update(N, N0)
Open Source Code No The paper does not provide any statement or link indicating the availability of open-source code for the described methodology.
Open Datasets No The paper describes a synthetic data generation model ('mixture of k spherical Gaussians distributions') but does not specify or provide access information for any publicly available or open dataset.
Dataset Splits No The paper is theoretical and does not describe experimental validation with specific training, validation, or test dataset splits.
Hardware Specification No The paper does not provide any specific details about the hardware used for running experiments.
Software Dependencies No The paper does not provide specific software dependency details, such as library names with version numbers.
Experiment Setup No The paper does not provide specific experimental setup details such as hyperparameter values or training configurations.