Beta-Negative Binomial Process and Exchangeable Random Partitions for Mixed-Membership Modeling

Authors: Mingyuan Zhou

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We consider the JACM, Psy Review, and NIPS12 corpora, restricting the vocabulary to terms that occur in five or more documents. ...To evaluate the BNBP topic model4 and its performance relative to that of the HDP-LDA, which are both nonparametric Bayesian algorithms, we randomly choose 50% of the words in each document as training, and use the remaining ones to calculate per-word held-out perplexity.
Researcher Affiliation Academia Mingyuan Zhou IROM Department, Mc Combs School of Business The University of Texas at Austin, Austin, TX 78712, USA mingyuan.zhou@mccombs.utexas.edu
Pseudocode No The paper describes algorithms and update equations in text (e.g., P(zji = k|x, z ji, γ0, m, c, r) in Section 3) but does not provide a formal pseudocode or algorithm block.
Open Source Code Yes Matlab code available in http://mingyuanzhou.github.io/
Open Datasets Yes We consider the JACM, Psy Review, and NIPS12 corpora... 1http://www.cs.princeton.edu/~blei/downloads/ 2http://psiexp.ss.uci.edu/research/programs_data/toolbox.htm 3http://www.cs.nyu.edu/~roweis/data.html
Dataset Splits No we randomly choose 50% of the words in each document as training, and use the remaining ones to calculate per-word held-out perplexity.
Hardware Specification No On a 3.4 GHz CPU, the fully collapsed Gibbs sampler of the BNBP topic model takes about 2.5 seconds per iteration on the NIPS12 corpus when the inferred number of topics is around 180.
Software Dependencies No All algorithms are implemented with unoptimized Matlab code.
Experiment Setup Yes We set the hyperparameters as a0 = b0 = e0 = f0 = 0.01. We consider 2500 Gibbs sampling iterations and collect the last 1500 samples. In each iteration, we randomize the ordering of the words. ...Similar to [26, 10], we set the topic Dirichlet smoothing parameter as η = 0.01, 0.02, 0.05, 0.10, 0.25, or 0.50.