Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Distributed Inference for Dirichlet Process Mixture Models
Authors: Hong Ge, Yutian Chen, Moquan Wan, Zoubin Ghahramani
ICML 2015 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide both local thread-level and distributed machine-level parallel implementations and study the performance of this sampler through an extensive set of experiments on image and text data. |
| Researcher Affiliation | Academia | Hong Ge EMAIL Yutian Chen EMAIL Moquan Wan EMAIL Zoubin Ghahramani EMAIL Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, UK |
| Pseudocode | Yes | Algorithm 1 The M R Sampler for DP |
| Open Source Code | No | The paper does not provide an explicit statement or link to open-source code for the described methodology. |
| Open Datasets | Yes | We study the performance of the DP sampler on two data sets: the MNIST digit images and CIFAR-10 natural colour images with standard pre-processing steps. (...) The performance of the proposed M R sampler for the HDP mixture model is evaluated on the NIPS corpus (1.9 million words) and a subset of the Wikipedia corpus constructed by randomly selecting 105 documents (roughly 40 million words). |
| Dataset Splits | No | The performance is measured in terms of predictive perplexities on 10% separate hold-out test documents for both the NIPS and Wikipedia datasets. This specifies a test split but does not provide complete details for training/validation splits or specific sample counts needed for full reproducibility of all splits. |
| Hardware Specification | Yes | Both experiments are performed using Amazon EC2 instances with up to 32 cores. For experiments with more than 32 cores, we use a cluster of c3.8xlarge instances each with 32 cores. |
| Software Dependencies | No | The paper describes the algorithms and models used but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | For all experiments, we initialise the concentration parameter α G(1, 1) , and randomly assign all the observations into 50 clusters. (...) For the FSD, a truncation level of 100 is used in all the experiments. |