Learning Mixture of Gaussians with Streaming Data
Authors: Aditi Raghunathan, Prateek Jain, Ravishankar Krishnawamy
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we study the problem of learning a mixture of Gaussians with streaming data: given a stream of N points in d dimensions generated by an unknown mixture of k spherical Gaussians, the goal is to estimate the model parameters using a single pass over the data stream. We analyze a streaming version of the popular Lloyd s heuristic and show that the algorithm estimates all the unknown centers of the component Gaussians accurately if they are sufficiently separated. Our main contribution is the first bias-variance bound for the problem of learning Gaussian mixtures with streaming data. |
| Researcher Affiliation | Collaboration | Aditi Raghunathan Stanford University aditir@stanford.edu Prateek Jain Microsoft Research, India prajain@microsoft.com Ravishankar Krishnaswamy Microsoft Research, India rakri@microsoft.com |
| Pseudocode | Yes | Algorithm 1 Init Alg(N0) ... Algorithm 2 Stream Kmeans(N, N0) ... Algorithm 3 Stream Soft Update(N, N0) |
| Open Source Code | No | The paper does not provide any statement or link indicating the availability of open-source code for the described methodology. |
| Open Datasets | No | The paper describes a synthetic data generation model ('mixture of k spherical Gaussians distributions') but does not specify or provide access information for any publicly available or open dataset. |
| Dataset Splits | No | The paper is theoretical and does not describe experimental validation with specific training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running experiments. |
| Software Dependencies | No | The paper does not provide specific software dependency details, such as library names with version numbers. |
| Experiment Setup | No | The paper does not provide specific experimental setup details such as hyperparameter values or training configurations. |