Toward Global Convergence of Gradient EM for Over-Paramterized Gaussian Mixture Models
Authors: Weihang Xu, Maryam Fazel, Simon S. Du
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we experimentally explore the behavior of gradient EM on GMMs. |
| Researcher Affiliation | Academia | Weihang Xu University of Washington xuwh@cs.washington.edu Maryam Fazel University of Washington mfazel@uw.edu Simon S. Du University of Washington ssdu@cs.washington.edu |
| Pseudocode | No | The paper describes the Gradient EM algorithm in text (Section 1.2) but does not include a formal pseudocode block or algorithm box. |
| Open Source Code | No | Q5: Does the paper provide open access to the data and code...? A: [No] Justification: We only run a small-scale experiment to verify an optimization phenomenon in our theory. |
| Open Datasets | No | Section 5 'Experiments' states: 'We use n = 2, 5, 10 Gaussian mixtures to learn data generated from one single ground truth Gaussian distribution N(µ , Id), respectively.' This indicates synthetic data generated for the experiments, but no information on public availability or access for this data is provided. |
| Dataset Splits | No | The paper does not explicitly mention training, validation, or test dataset splits. The experiments are conducted on synthetically generated data to observe convergence behavior, with likelihood estimated via Monte Carlo on fresh samples each iteration. |
| Hardware Specification | No | Q8: For each experiment, does the paper provide sufficient information on the computer resources...? A: [No] Justification: Our experiment only shows the phenomenon on small-scale synthetic data, so we did not record the computation resource used. |
| Software Dependencies | No | The paper does not specify software dependencies with version numbers. |
| Experiment Setup | Yes | We choose the experimental setting of d = 5, η = 0.7. We use n = 2, 5, 10 Gaussian mixtures to learn data generated from one single ground truth Gaussian distribution N(µ , Id), respectively. ... The mixing weights of student GMM are randomly sampled from a standard Dirichlet distribution and set as fixed during gradient EM update. The covariances of all component Gaussians are set as the identity matrix. ... we approximate the gradient step via Monte Carlo method, with sample size 3.5 105. |