Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

The Infinite Mixture of Infinite Gaussian Mixtures

Authors: Halid Z Yerebakan, Bartek Rajwa, Murat Dundar

NeurIPS 2014 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on several artificial and real-world data sets suggest the proposed I2GMM model can predict clusters more accurately than existing variational Bayes and Gibbs sampler versions of DPMG.
Researcher Affiliation Academia Halid Z. Yerebakan Department of Computer and Information Science IUPUI Indianapolis, IN 46202 EMAIL Bartek Rajwa Bindley Bioscience Center Purdue University W. Lafayette, IN 47907 EMAIL Murat Dundar Department of Computer and Information Science IUPUI Indianapolis, IN 46202 EMAIL
Pseudocode No The paper describes the generative model and inference steps using mathematical equations, but does not provide structured pseudocode or an algorithm block.
Open Source Code Yes I2GMM is implemented in C++. The source files and executables are available on the web. 2https://github.com/halidziya/I2GMM
Open Datasets Yes Lymphoma: Lymphoma data set is one of the data sets used in the Flow CAP (Flow Cytometry Critical Assessment of Population Identification Methods) 2010 competition [1].
Dataset Splits No The paper refers to training for MCMC sampling (“run for 1500 sweeps”, “1000 samples are ignored as burn-in”) but does not provide specific train/validation/test dataset splits for reproducibility.
Hardware Specification Yes The largest gain by parallelization is obtained on the rare classes data set which offered almost two-fold increase by parallelization on an eight-core workstation.
Software Dependencies No The paper mentions “C++” and “MATLAB R” for implementations but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes We use vague priors with α and γ by fixing their value to one. We set m to the minimum feasible value, which is d+2... We use s = 150/(d(logd)), κ0 = 0.05, and κ1 = 0.5 in experiments with all five data sets described above.