Toward Global Convergence of Gradient EM for Over-Paramterized Gaussian Mixture Models

Authors: Weihang Xu, Maryam Fazel, Simon S. Du

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section we experimentally explore the behavior of gradient EM on GMMs.
Researcher Affiliation Academia Weihang Xu University of Washington xuwh@cs.washington.edu Maryam Fazel University of Washington mfazel@uw.edu Simon S. Du University of Washington ssdu@cs.washington.edu
Pseudocode No The paper describes the Gradient EM algorithm in text (Section 1.2) but does not include a formal pseudocode block or algorithm box.
Open Source Code No Q5: Does the paper provide open access to the data and code...? A: [No] Justification: We only run a small-scale experiment to verify an optimization phenomenon in our theory.
Open Datasets No Section 5 'Experiments' states: 'We use n = 2, 5, 10 Gaussian mixtures to learn data generated from one single ground truth Gaussian distribution N(µ , Id), respectively.' This indicates synthetic data generated for the experiments, but no information on public availability or access for this data is provided.
Dataset Splits No The paper does not explicitly mention training, validation, or test dataset splits. The experiments are conducted on synthetically generated data to observe convergence behavior, with likelihood estimated via Monte Carlo on fresh samples each iteration.
Hardware Specification No Q8: For each experiment, does the paper provide sufficient information on the computer resources...? A: [No] Justification: Our experiment only shows the phenomenon on small-scale synthetic data, so we did not record the computation resource used.
Software Dependencies No The paper does not specify software dependencies with version numbers.
Experiment Setup Yes We choose the experimental setting of d = 5, η = 0.7. We use n = 2, 5, 10 Gaussian mixtures to learn data generated from one single ground truth Gaussian distribution N(µ , Id), respectively. ... The mixing weights of student GMM are randomly sampled from a standard Dirichlet distribution and set as fixed during gradient EM update. The covariances of all component Gaussians are set as the identity matrix. ... we approximate the gradient step via Monte Carlo method, with sample size 3.5 105.