reproducibilityindex.ai

Scalable Deep Poisson Factor Analysis for Topic Modeling

Authors: Zhe Gan, Changyou Chen, Ricardo Henao, David Carlson, Lawrence Carin

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on several corpora show that the proposed approach readily handles very large collections of text documents, infers structured topic representations, and obtains superior test perplexities when compared with related models.
Researcher Affiliation	Academia	Zhe Gan ZHE.GAN@DUKE.EDU Changyou Chen CHANGYOU.CHEN@DUKE.EDU Ricardo Henao RICARDO.HENAO@DUKE.EDU David Carlson DAVID.CARLSON@DUKE.EDU Lawrence Carin LCARIN@DUKE.EDU Department of Electrical and Computer Engineering, Duke University, Durham, NC 27708, USA
Pseudocode	Yes	Algorithm 1 BCDF algorithm for DPFA.
Open Source Code	No	The paper does not provide an explicit statement or link indicating that the source code for their methodology is publicly released.
Open Datasets	Yes	Finally, we downloaded 10M random documents from Wikipedia using scripts provided in Hoffman et al. (2010) and randomly selected 1K documents for testing.
Dataset Splits	Yes	To choose good parameters for SGNHT, e.g., the step size and the variance of the injected noise, we randomly choose about 10% documents from the training data as validation set.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments, such as CPU or GPU models, or memory specifications.
Software Dependencies	No	The paper mentions models and algorithms used (e.g., 'RSM is trained using convergence-divergence', 'nHDP, we use the publicly available code from Paisley et al. (2015)'), but it does not provide specific software dependencies or library versions for its own implementation.
Experiment Setup	Yes	For 20 Newsgroups and RCV1-v2 corpora, we use 2,000 mini-batches for burn-in followed by 1,500 collection samples to calculate test perplexities; while for the Wikipedia dataset, 3,500 mini-batches are used for burn-in. The mini-batch size for all stochastic algorithms is set to 100. ... We set the hyperparameters of DPFA as aφ = 1.01, c0 = e0 = 1, f0 = 0.01, and pn = 0.5.