Beta-Negative Binomial Process and Exchangeable Random Partitions for Mixed-Membership Modeling
Authors: Mingyuan Zhou
NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We consider the JACM, Psy Review, and NIPS12 corpora, restricting the vocabulary to terms that occur in five or more documents. ...To evaluate the BNBP topic model4 and its performance relative to that of the HDP-LDA, which are both nonparametric Bayesian algorithms, we randomly choose 50% of the words in each document as training, and use the remaining ones to calculate per-word held-out perplexity. |
| Researcher Affiliation | Academia | Mingyuan Zhou IROM Department, Mc Combs School of Business The University of Texas at Austin, Austin, TX 78712, USA mingyuan.zhou@mccombs.utexas.edu |
| Pseudocode | No | The paper describes algorithms and update equations in text (e.g., P(zji = k|x, z ji, γ0, m, c, r) in Section 3) but does not provide a formal pseudocode or algorithm block. |
| Open Source Code | Yes | Matlab code available in http://mingyuanzhou.github.io/ |
| Open Datasets | Yes | We consider the JACM, Psy Review, and NIPS12 corpora... 1http://www.cs.princeton.edu/~blei/downloads/ 2http://psiexp.ss.uci.edu/research/programs_data/toolbox.htm 3http://www.cs.nyu.edu/~roweis/data.html |
| Dataset Splits | No | we randomly choose 50% of the words in each document as training, and use the remaining ones to calculate per-word held-out perplexity. |
| Hardware Specification | No | On a 3.4 GHz CPU, the fully collapsed Gibbs sampler of the BNBP topic model takes about 2.5 seconds per iteration on the NIPS12 corpus when the inferred number of topics is around 180. |
| Software Dependencies | No | All algorithms are implemented with unoptimized Matlab code. |
| Experiment Setup | Yes | We set the hyperparameters as a0 = b0 = e0 = f0 = 0.01. We consider 2500 Gibbs sampling iterations and collect the last 1500 samples. In each iteration, we randomize the ordering of the words. ...Similar to [26, 10], we set the topic Dirichlet smoothing parameter as η = 0.01, 0.02, 0.05, 0.10, 0.25, or 0.50. |