reproducibilityindex.ai

Distributed Inference for Dirichlet Process Mixture Models

Authors: Hong Ge, Yutian Chen, Moquan Wan, Zoubin Ghahramani

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide both local thread-level and distributed machine-level parallel implementations and study the performance of this sampler through an extensive set of experiments on image and text data.
Researcher Affiliation	Academia	Hong Ge HG344@CAM.AC.UK Yutian Chen YUTIAN.CHEN@ENG.CAM.AC.UK Moquan Wan MW545@CAM.AC.UK Zoubin Ghahramani ZOUBIN@ENG.CAM.AC.UK Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, UK
Pseudocode	Yes	Algorithm 1 The M R Sampler for DP
Open Source Code	No	The paper does not provide an explicit statement or link to open-source code for the described methodology.
Open Datasets	Yes	We study the performance of the DP sampler on two data sets: the MNIST digit images and CIFAR-10 natural colour images with standard pre-processing steps. (...) The performance of the proposed M R sampler for the HDP mixture model is evaluated on the NIPS corpus (1.9 million words) and a subset of the Wikipedia corpus constructed by randomly selecting 105 documents (roughly 40 million words).
Dataset Splits	No	The performance is measured in terms of predictive perplexities on 10% separate hold-out test documents for both the NIPS and Wikipedia datasets. This specifies a test split but does not provide complete details for training/validation splits or specific sample counts needed for full reproducibility of all splits.
Hardware Specification	Yes	Both experiments are performed using Amazon EC2 instances with up to 32 cores. For experiments with more than 32 cores, we use a cluster of c3.8xlarge instances each with 32 cores.
Software Dependencies	No	The paper describes the algorithms and models used but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup	Yes	For all experiments, we initialise the concentration parameter α G(1, 1) , and randomly assign all the observations into 50 clusters. (...) For the FSD, a truncation level of 100 is used in all the experiments.