Learning Mixtures of Gaussians Using the DDPM Objective
Authors: Kulin Shah, Sitan Chen, Adam Klivans
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this work, we give the first provably efficient results along these lines for one of the most fundamental distribution families, Gaussian mixture models. We prove that gradient descent on the denoising diffusion probabilistic model (DDPM) objective can efficiently recover the ground truth parameters of the mixture model in the following two settings: 1. We show gradient descent with random initialization learns mixtures of two spherical Gaussians in d dimensions with 1/poly(d)-separated centers. 2. We show gradient descent with a warm start learns mixtures of K spherical Gaussians with (log(min(K, d)))-separated centers. |
| Researcher Affiliation | Academia | UT Austin kulinshah@utexas.edu Sitan Chen Harvard University sitan@seas.harvard.edu Adam Klivans UT Austin klivans@cs.utexas.edu |
| Pseudocode | Yes | Algorithm 1: GMMDENOISER(t, {µ(0)i}K i=1, H) |
| Open Source Code | No | The paper is theoretical and does not mention any open-source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and focuses on proofs and algorithms. It refers to data distributions and samples in a theoretical context but does not mention the use of any specific publicly available datasets for training or provide access information for such datasets. |
| Dataset Splits | No | The paper is theoretical and does not discuss dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not describe any specific hardware used for running experiments. |
| Software Dependencies | No | The paper is theoretical and does not mention any specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and focuses on mathematical proofs and algorithmic analysis. It does not describe an experimental setup, hyperparameters, or training configurations. |