Consistent Estimation of Identifiable Nonparametric Mixture Models from Grouped Observations

Authors: Alexander Ritchie, Robert A. Vandermeulen, Clayton Scott

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section we compare our coreset approach against several competing methods on a number of real and highly overlapping synthetic datasets.
Researcher Affiliation Academia Alexander Ritchie Department of EECS University of Michigan Ann Arbor, MI 48109 aritch@umich.edu Robert A. Vandermeulen ML group Technische Universität Berlin 10587 Berlin, Germany vandermeulen@tu-berlin.de Clayton Scott Departments of EECS, Statistics University of Michigan Ann Arbor, MI 48109 clayscot@umich.edu
Pseudocode Yes Pseudocode for the APSGD algorithm for solving (6) is given in the supplementary material.
Open Source Code No All code and synthetic datasets are publicly available.2 [Footnote 2: Authors Git Hub link to go here in final version.]
Open Datasets Yes All code and synthetic datasets are publicly available.2 The MAGIC gamma ray detection dataset [33] is publicly available via the UCI machine learning repository. The Russian-troll-tweets Twitter dataset is publicly available through Five Thirty Eight.3
Dataset Splits Yes Each method was trained using 80% of the available data, and the ROC curve was generated from the remaining 20%.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running experiments.
Software Dependencies No The paper does not provide specific software names with version numbers for reproducibility.
Experiment Setup Yes For synthetic experiments, R was selected to yield the initialization with the lowest empirical TISE. R was chosen from {10, 20, 30, 40, 50} for both moons datasets, and from {60, 70, 80, 90, 100} for the Olympic rings and half-disks datasets. We used R = 200 for the MAGIC and Twitter datasets.