The Infinite Mixture of Infinite Gaussian Mixtures

Authors: Halid Z Yerebakan, Bartek Rajwa, Murat Dundar

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on several artificial and real-world data sets suggest the proposed I2GMM model can predict clusters more accurately than existing variational Bayes and Gibbs sampler versions of DPMG.
Researcher Affiliation Academia Halid Z. Yerebakan Department of Computer and Information Science IUPUI Indianapolis, IN 46202 hzyereba@cs.iupui.edu Bartek Rajwa Bindley Bioscience Center Purdue University W. Lafayette, IN 47907 rajwa@cyto.purdue.edu Murat Dundar Department of Computer and Information Science IUPUI Indianapolis, IN 46202 dundar@cs.iupui.edu
Pseudocode No The paper describes the generative model and inference steps using mathematical equations, but does not provide structured pseudocode or an algorithm block.
Open Source Code Yes I2GMM is implemented in C++. The source files and executables are available on the web. 2https://github.com/halidziya/I2GMM
Open Datasets Yes Lymphoma: Lymphoma data set is one of the data sets used in the Flow CAP (Flow Cytometry Critical Assessment of Population Identification Methods) 2010 competition [1].
Dataset Splits No The paper refers to training for MCMC sampling (“run for 1500 sweeps”, “1000 samples are ignored as burn-in”) but does not provide specific train/validation/test dataset splits for reproducibility.
Hardware Specification Yes The largest gain by parallelization is obtained on the rare classes data set which offered almost two-fold increase by parallelization on an eight-core workstation.
Software Dependencies No The paper mentions “C++” and “MATLAB R” for implementations but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes We use vague priors with α and γ by fixing their value to one. We set m to the minimum feasible value, which is d+2... We use s = 150/(d(logd)), κ0 = 0.05, and κ1 = 0.5 in experiments with all five data sets described above.