reproducibilityindex.ai

The Infinite Mixture of Infinite Gaussian Mixtures

Authors: Halid Z Yerebakan, Bartek Rajwa, Murat Dundar

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on several artiﬁcial and real-world data sets suggest the proposed I2GMM model can predict clusters more accurately than existing variational Bayes and Gibbs sampler versions of DPMG.
Researcher Affiliation	Academia	Halid Z. Yerebakan Department of Computer and Information Science IUPUI Indianapolis, IN 46202 hzyereba@cs.iupui.edu Bartek Rajwa Bindley Bioscience Center Purdue University W. Lafayette, IN 47907 rajwa@cyto.purdue.edu Murat Dundar Department of Computer and Information Science IUPUI Indianapolis, IN 46202 dundar@cs.iupui.edu
Pseudocode	No	The paper describes the generative model and inference steps using mathematical equations, but does not provide structured pseudocode or an algorithm block.
Open Source Code	Yes	I2GMM is implemented in C++. The source ﬁles and executables are available on the web. 2https://github.com/halidziya/I2GMM
Open Datasets	Yes	Lymphoma: Lymphoma data set is one of the data sets used in the Flow CAP (Flow Cytometry Critical Assessment of Population Identiﬁcation Methods) 2010 competition [1].
Dataset Splits	No	The paper refers to training for MCMC sampling (“run for 1500 sweeps”, “1000 samples are ignored as burn-in”) but does not provide specific train/validation/test dataset splits for reproducibility.
Hardware Specification	Yes	The largest gain by parallelization is obtained on the rare classes data set which offered almost two-fold increase by parallelization on an eight-core workstation.
Software Dependencies	No	The paper mentions “C++” and “MATLAB R” for implementations but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	We use vague priors with α and γ by ﬁxing their value to one. We set m to the minimum feasible value, which is d+2... We use s = 150/(d(logd)), κ0 = 0.05, and κ1 = 0.5 in experiments with all ﬁve data sets described above.