Precision-Recall Balanced Topic Modelling
Authors: Seppo Virtanen, Mark Girolami
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the proposed approach is effective and infers more coherent topics than existing related approaches. |
| Researcher Affiliation | Academia | Seppo Virtanen University of Cambridge sjv35@cam.ac.uk Mark Girolami University of Cambridge and The Alan Turing Institute mag92@cam.ac.uk |
| Pseudocode | No | The paper describes the collapsed Gibbs sampling algorithm but does not present it in a pseudocode block or algorithm environment. |
| Open Source Code | No | No explicit statement or link providing open-source code for the described methodology was found. |
| Open Datasets | Yes | We show the model performance for three subsets of publicly available data collections, NYTIMES4, movie reviews5 and 20newsgroup6, as well as for textual product descriptions combined with categorical information that we employ for further evaluations. 4https://archive.ics.uci.edu/ml/datasets/Bag+of+Words 5http://www.cs.cornell.edu/people/pabo/movie-review-data/ 6http://qwone.com/~jason/20Newsgroups/ |
| Dataset Splits | Yes | We sample 1/5 of the documents for each data collection to create a test set containing c M documents. |
| Hardware Specification | No | No specific hardware specifications (e.g., GPU/CPU models, memory) used for running experiments were mentioned. |
| Software Dependencies | No | The paper mentions using "R-INLA" but does not specify a version number for it or for any other software dependencies. |
| Experiment Setup | Yes | We initialise the assignments randomly and set αk = 0.1 and γ = 0.01, corresponding to weakly informative priors, and use 5 x 10^3 sampling steps as burnin. After the burnin we collect posterior averages for S = 200 samples. We infer the models for K = 200 topics and for 21 equi-spaced values between (0, 0.2) for λ, noting that, λ = 0, corresponds to the standard topic model (LDA). |