Precision-Recall Balanced Topic Modelling

Authors: Seppo Virtanen, Mark Girolami

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate the proposed approach is effective and infers more coherent topics than existing related approaches.
Researcher Affiliation Academia Seppo Virtanen University of Cambridge sjv35@cam.ac.uk Mark Girolami University of Cambridge and The Alan Turing Institute mag92@cam.ac.uk
Pseudocode No The paper describes the collapsed Gibbs sampling algorithm but does not present it in a pseudocode block or algorithm environment.
Open Source Code No No explicit statement or link providing open-source code for the described methodology was found.
Open Datasets Yes We show the model performance for three subsets of publicly available data collections, NYTIMES4, movie reviews5 and 20newsgroup6, as well as for textual product descriptions combined with categorical information that we employ for further evaluations. 4https://archive.ics.uci.edu/ml/datasets/Bag+of+Words 5http://www.cs.cornell.edu/people/pabo/movie-review-data/ 6http://qwone.com/~jason/20Newsgroups/
Dataset Splits Yes We sample 1/5 of the documents for each data collection to create a test set containing c M documents.
Hardware Specification No No specific hardware specifications (e.g., GPU/CPU models, memory) used for running experiments were mentioned.
Software Dependencies No The paper mentions using "R-INLA" but does not specify a version number for it or for any other software dependencies.
Experiment Setup Yes We initialise the assignments randomly and set αk = 0.1 and γ = 0.01, corresponding to weakly informative priors, and use 5 x 10^3 sampling steps as burnin. After the burnin we collect posterior averages for S = 200 samples. We infer the models for K = 200 topics and for 21 equi-spaced values between (0, 0.2) for λ, noting that, λ = 0, corresponds to the standard topic model (LDA).