Ordinal Mixed Membership Models

Authors: Seppo Virtanen, Mark Girolami

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply the models to a collection of consumer-generated reviews of mobile software applications, where each review contains unstructured text data accompanied with an ordinal rating, and demonstrate that the models infer useful and meaningful recurring patterns of consumer feedback. We also compare the developed models to relevant existing works, which rely on improper statistical assumptions for ordinal variables, showing significant improvements both in predictive ability and knowledge extraction.
Researcher Affiliation Academia Seppo Virtanen S.VIRTANEN@WARWICK.AC.UK Mark Girolami M.GIROLAMI@WARWICK.AC.UK Department of Statistics, University of Warwick, CV4 7AL Coventry UK
Pseudocode No The paper describes the inference algorithms (Variational Bayesian Inference, MCMC Sampling Scheme) in text and equations, but does not provide structured pseudocode blocks or figures explicitly labeled as "Pseudocode" or "Algorithm".
Open Source Code No The paper does not provide any explicit statement about releasing source code or include links to code repositories.
Open Datasets No We collect consumer-generated reviews of mobile software applications (apps) from Apple’s App Store. The review data for each app contains an ordinal rating taking values in five categories ranging from poor to excellent as well as free-flowing text data. We select the vocabulary using tfidf scores. After simple pre-processing, the data collection contains M = 5511 apps with vocabulary size V = 3995 and total number of words PM m=1 D(m) = 1.5 106. The paper collects data from Apple's App Store but does not provide a specific link, DOI, or formal citation for this collected dataset to enable public access.
Dataset Splits Yes We partition available data into multiple training and test sets using 10-fold cross validation.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies No The paper does not provide specific software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes For the joint correlated topic model, referred to as, JTM, we bound the maximum number of active topics to K = 100, set dimensionality of the latent variables to L = 30, α0 = 1, β0 = 10^-6 and prior precision to l = L. The results are shown for λ = 0.001... The SLDA models were also computed for K = 100 and we used ζ = 1. We used 500 sweeps of sampling for inferring the topics and response parameters. For testing we used 500 sweeps of collapsed Gibbs sampling... For all the topic models we used γ = 0.01.