Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
The Infinite Mixture of Infinite Gaussian Mixtures
Authors: Halid Z Yerebakan, Bartek Rajwa, Murat Dundar
NeurIPS 2014 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on several artificial and real-world data sets suggest the proposed I2GMM model can predict clusters more accurately than existing variational Bayes and Gibbs sampler versions of DPMG. |
| Researcher Affiliation | Academia | Halid Z. Yerebakan Department of Computer and Information Science IUPUI Indianapolis, IN 46202 EMAIL Bartek Rajwa Bindley Bioscience Center Purdue University W. Lafayette, IN 47907 EMAIL Murat Dundar Department of Computer and Information Science IUPUI Indianapolis, IN 46202 EMAIL |
| Pseudocode | No | The paper describes the generative model and inference steps using mathematical equations, but does not provide structured pseudocode or an algorithm block. |
| Open Source Code | Yes | I2GMM is implemented in C++. The source files and executables are available on the web. 2https://github.com/halidziya/I2GMM |
| Open Datasets | Yes | Lymphoma: Lymphoma data set is one of the data sets used in the Flow CAP (Flow Cytometry Critical Assessment of Population Identification Methods) 2010 competition [1]. |
| Dataset Splits | No | The paper refers to training for MCMC sampling (“run for 1500 sweeps”, “1000 samples are ignored as burn-in”) but does not provide specific train/validation/test dataset splits for reproducibility. |
| Hardware Specification | Yes | The largest gain by parallelization is obtained on the rare classes data set which offered almost two-fold increase by parallelization on an eight-core workstation. |
| Software Dependencies | No | The paper mentions “C++” and “MATLAB R” for implementations but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | We use vague priors with α and γ by fixing their value to one. We set m to the minimum feasible value, which is d+2... We use s = 150/(d(logd)), κ0 = 0.05, and κ1 = 0.5 in experiments with all five data sets described above. |