Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Robust Model Selection and Nearly-Proper Learning for GMMs
Authors: Allen Liu, Jerry Li, Ankur Moitra
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this work, we study the problem of robust model selection for univariate Gaussian mixture models (GMMs). Given poly(k/ϵ) samples from a distribution that is ϵ-close in TV distance to a GMM with k components, we can construct a GMM with e O(k) components that approximates the distribution to within e O(ϵ) in poly(k/ϵ) time. Thus we are able to approximately determine the minimum number of components needed to fit the distribution within a logarithmic factor. Prior to our work, the only known algorithms for learning arbitrary univariate GMMs either output significantly more than k components (e.g. k/ϵ2 components for kernel density estimates) or run in time exponential in k. |
| Researcher Affiliation | Collaboration | Allen Liu MIT Cambridge, MA 02139 EMAIL Jerry Li Microsoft Research Redmond, WA 98052 EMAIL Ankur Moitra MIT Cambridge, MA 02139 EMAIL |
| Pseudocode | Yes | We show how to eliminate the need for pdf access below and present our full algorithm in complete detail in Section C. ... In Section C, we provide Algorithm 1: Learning GMMs, which details the steps of our approach. |
| Open Source Code | No | The paper is theoretical and focuses on algorithms and proofs; it does not mention the release of any source code. |
| Open Datasets | No | The paper is theoretical and does not use or provide access to specific datasets for training or evaluation. The checklist states 'N/A' for 'Did you include the code, data, and instructions needed to reproduce the main experimental results'. |
| Dataset Splits | No | The paper is theoretical and does not report on experiments, therefore no dataset splits for training, validation, or testing are mentioned. |
| Hardware Specification | No | The paper is theoretical and does not report on experiments, thus no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not mention specific software dependencies with version numbers for implementation or experimental setup. |
| Experiment Setup | No | The paper is theoretical and does not report on experiments, thus no experimental setup details such as hyperparameters or training settings are provided. |