Efficient Training of LDA on a GPU by Mean-for-Mode Estimation

Authors: Jean-Baptiste Tristan, Joseph Tassarotti, Guy Steele

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We have run a series of experiments which show that in practice, Mean-for-Mode estimation converges in fewer samples than standard uncollapsed Gibbs sampling. In these experiments, we observed how the log-likelihood of LDA evolves with the number of samples. Figure 3 presents the results of one of our experiments, run on a subset of Wikipedia... We present the resulting benchmarks in Figures 4 and 5 to show how the gap between the GPU algorithms runtimes and that of a collapsed Gibbs sampler scales.
Researcher Affiliation Collaboration Jean-Baptiste Tristan JEAN.BAPTISTE.TRISTAN@ORACLE.COM Oracle Labs, USA Joseph Tassarotti JTASSARO@CS.CMU.EDU Department of Computer Science, Carnegie Mellon University, USA Guy L. Steele Jr. GUY.STEELE@ORACLE.COM Oracle Labs, USA
Pseudocode Yes Algorithm 1 Drawing the latent variables Algorithm 2 Estimation of the φ variables Algorithm 3 LDA sampler using both sparse and dense matrices
Open Source Code No The paper does not include an explicit statement about open-sourcing the code for the described methodology, nor does it provide a link to a code repository.
Open Datasets Yes on a subset of Wikipedia (50,000 documents, 3,000,000 tokens, 40,000 vocabulary words)
Dataset Splits No The paper mentions using a Wikipedia subset of specific size and varying number of documents/topics for experiments, but it does not provide specific training/validation/test dataset splits (e.g., percentages or sample counts).
Hardware Specification Yes We implemented this algorithm on an NVIDIA Titan Black, as well as the uncollapsed Gibbs sampler. We also implemented a collapsed Gibbs sampler for comparison, on an Intel i7-4820K CPU.
Software Dependencies No The paper discusses the implementation on GPUs and CPUs but does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes using 20 topics, and both α and β equal to 0.1. ... The experiment was run 10 times with a varying seed for the random number generator. ... The number of initial iterations that are done using dense probability matrices corresponds to the parameter D.