Distributed Inference for Dirichlet Process Mixture Models
Authors: Hong Ge, Yutian Chen, Moquan Wan, Zoubin Ghahramani
ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide both local thread-level and distributed machine-level parallel implementations and study the performance of this sampler through an extensive set of experiments on image and text data. |
| Researcher Affiliation | Academia | Hong Ge HG344@CAM.AC.UK Yutian Chen YUTIAN.CHEN@ENG.CAM.AC.UK Moquan Wan MW545@CAM.AC.UK Zoubin Ghahramani ZOUBIN@ENG.CAM.AC.UK Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, UK |
| Pseudocode | Yes | Algorithm 1 The M R Sampler for DP |
| Open Source Code | No | The paper does not provide an explicit statement or link to open-source code for the described methodology. |
| Open Datasets | Yes | We study the performance of the DP sampler on two data sets: the MNIST digit images and CIFAR-10 natural colour images with standard pre-processing steps. (...) The performance of the proposed M R sampler for the HDP mixture model is evaluated on the NIPS corpus (1.9 million words) and a subset of the Wikipedia corpus constructed by randomly selecting 105 documents (roughly 40 million words). |
| Dataset Splits | No | The performance is measured in terms of predictive perplexities on 10% separate hold-out test documents for both the NIPS and Wikipedia datasets. This specifies a test split but does not provide complete details for training/validation splits or specific sample counts needed for full reproducibility of all splits. |
| Hardware Specification | Yes | Both experiments are performed using Amazon EC2 instances with up to 32 cores. For experiments with more than 32 cores, we use a cluster of c3.8xlarge instances each with 32 cores. |
| Software Dependencies | No | The paper describes the algorithms and models used but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | For all experiments, we initialise the concentration parameter α G(1, 1) , and randomly assign all the observations into 50 clusters. (...) For the FSD, a truncation level of 100 is used in all the experiments. |