Scalable Deep Poisson Factor Analysis for Topic Modeling

Authors: Zhe Gan, Changyou Chen, Ricardo Henao, David Carlson, Lawrence Carin

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on several corpora show that the proposed approach readily handles very large collections of text documents, infers structured topic representations, and obtains superior test perplexities when compared with related models.
Researcher Affiliation Academia Zhe Gan ZHE.GAN@DUKE.EDU Changyou Chen CHANGYOU.CHEN@DUKE.EDU Ricardo Henao RICARDO.HENAO@DUKE.EDU David Carlson DAVID.CARLSON@DUKE.EDU Lawrence Carin LCARIN@DUKE.EDU Department of Electrical and Computer Engineering, Duke University, Durham, NC 27708, USA
Pseudocode Yes Algorithm 1 BCDF algorithm for DPFA.
Open Source Code No The paper does not provide an explicit statement or link indicating that the source code for their methodology is publicly released.
Open Datasets Yes Finally, we downloaded 10M random documents from Wikipedia using scripts provided in Hoffman et al. (2010) and randomly selected 1K documents for testing.
Dataset Splits Yes To choose good parameters for SGNHT, e.g., the step size and the variance of the injected noise, we randomly choose about 10% documents from the training data as validation set.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as CPU or GPU models, or memory specifications.
Software Dependencies No The paper mentions models and algorithms used (e.g., 'RSM is trained using convergence-divergence', 'nHDP, we use the publicly available code from Paisley et al. (2015)'), but it does not provide specific software dependencies or library versions for its own implementation.
Experiment Setup Yes For 20 Newsgroups and RCV1-v2 corpora, we use 2,000 mini-batches for burn-in followed by 1,500 collection samples to calculate test perplexities; while for the Wikipedia dataset, 3,500 mini-batches are used for burn-in. The mini-batch size for all stochastic algorithms is set to 100. ... We set the hyperparameters of DPFA as aφ = 1.01, c0 = e0 = 1, f0 = 0.01, and pn = 0.5.