Distributed Stochastic Gradient MCMC

Authors: Sungjin Ahn, Babak Shahbaba, Max Welling

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments for LDA on Wikipedia and Pubmed show that relative to the state of the art in distributed MCMC we reduce compute time from 27 hours to half an hour in order to reach the same perplexity level.
Researcher Affiliation Academia Sungjin Ahn SUNGJIA@ICS.UCI.EDU Department of Computer Science, University of California, Irvine Babak Shahbaba BABAKS@UCI.EDU Department of Statistics, University of California, Irvine Max Welling M.WELLING@UVA.NL Machine Learning Group, University of Amsterdam
Pseudocode Yes Algorithm 1 D-SGLD Pseudo Code
Open Source Code No The paper does not contain an explicit statement about the availability of open-source code for the described methodology, nor does it provide any links to a code repository.
Open Datasets Yes We used the same vocabulary of 7702 words as used by Hoffman et al. (2010). (ii) Pub Med Abstract corpus contains 8.2M articles of approximately 730M tokens in total. After removing stopwords and low occurrence (less than 300) words, we obtained a vocabulary of 39,987 words.
Dataset Splits No The predictive perplexities were computed on 1000 separate holdout set, with a 90/10 (training/test) split, and LDA s hyper-parameters were set to α = 0.01 and β = 0.0001 following Patterson & Teh (2013). A validation split is not explicitly mentioned.
Hardware Specification No The paper mentions 'a cluster of 20 workers' and '20 homogeneous workers' but does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies No The paper mentions 'For our Python implementation' but does not specify particular software dependencies with version numbers (e.g., Python version, specific libraries like NumPy, SciPy, or machine learning frameworks with their versions).
Experiment Setup Yes Following Patterson & Teh (2013), we set the mini-batch size to 50 documents, and for each update of Eqn. (7) we ran 100 Gibbs iterations for each document in the mini-batch. The step-sizes were annealed by a schedule ϵt = a(1+t/b) c. As we fixed b = 1000 and c = 0.6, the entire schedule was set by a which we choose by running parallel chains with different a s and then choosing the best. (...) LDA s hyper-parameters were set to α = 0.01 and β = 0.0001 following Patterson & Teh (2013). The number of topics K was set to 100. (...) we set the trajectory length τ = 10 for all workers.