Deep Latent Dirichlet Allocation with Topic-Layer-Adaptive Stochastic Gradient Riemannian MCMC

Authors: Yulai Cong, Bo Chen, Hongwei Liu, Mingyuan Zhou

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present experimental results on three benchmark corpora: 20Newsgroups (20News), Reuters Corpus Volume I (RCV1) that is moderately large, and Wikipedia (Wiki) that is huge.
Researcher Affiliation Academia 1National Laboratory of Radar Signal Processing, Collaborative Innovation Center of Information Sensing and Understanding, Xidian University, Xi an, China. 2Mc Combs School of Business, The University of Texas at Austin, Austin, TX 78712, USA.
Pseudocode Yes Algorithm 1 TLASGR MCMC for DLDA (PGBN).
Open Source Code No The paper does not provide an explicit statement about releasing source code for the methodology described, nor does it provide a link to a code repository.
Open Datasets Yes We present experimental results on three benchmark corpora: 20Newsgroups (20News), Reuters Corpus Volume I (RCV1) that is moderately large, and Wikipedia (Wiki) that is huge. ... To make a fair comparison, these corpora, including the training/testing partitions, are set to be the same as those in Gan et al. (2015) and Henao et al. (2015). ... we apply a three-layer Poisson randomized gamma gamma belief network (PRG-GBN) (Zhou et al., 2016a) to 60,000 MNIST digits...
Dataset Splits Yes 20News consists of 18,845 documents with a vocabulary size of 2,000, partitioned into 11,315 training documents and 7,531 test ones. RCV1 consists of 804,414 documents with a vocabulary size of 10,000, where 10,000 documents are randomly selected for testing. Wiki consists of 10 million documents... randomly select 1,000 documents for testing. ... for each test document, we randomly select 80% of the word tokens to sample the local variables specific to the document, under the global model parameters of each MCMC iteration; ... we normalize these accumulated Poisson rates to calculate the perplexity using the remaining 20% word tokens.
Hardware Specification No The paper generally mentions aiming to run on 'a regular personal computer' and applying a model to '60,000 MNIST digits' but does not provide specific hardware details such as GPU or CPU models, memory, or other specifications used for the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes For the proposed algorithms, we set the mini-batch size as 200, and use as burn-in 2000 mini-batches for both 20News and RCV1 and 3500 mini-batches for Wiki. We collect 1500 samples to calculate perplexity. For point perplexity, given the global parameters of an MCMC sample, we sample the local variables with 600 iterations and collect one sample every two iterations during the last 400 iterations. The hyperparameters of DLDA are set as: η(l) = 1/Kl, a0 = b0 = 0.01, and γ0 = c0 = e0 = f0 = 1.