reproducibilityindex.ai

Collapsed variational Bayes for Markov jump processes

Authors: Boqian Zhang, Jiangwei Pan, Vinayak A. Rao

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We apply our ideas to synthetic data as well as a dataset of check-in recordings, where we demonstrate superior performance over state-of-the-art MCMC methods. We present qualitative and quantitative experiments using synthetic and real datasets to demonstrate the accuracy and efﬁciency of our variational Bayes (VB) algorithm. Datasets. We use a dataset of check-in sequences from 8967 Four Square users in the year 2011
Researcher Affiliation	Academia	Jiangwei Pan Department of Computer Science Duke University panjiangwei@gmail.com Boqian Zhang Department of Statistics Purdue University zhan1977@purdue.edu Vinayak Rao Department of Statistics Purdue University varao@purdue.edu
Pseudocode	Yes	The paper describes the Gillespie algorithm steps: “1. First, at time t = 0, sample an initial state s0 from π. 2. From here onwards, upon entering a new state i, sample the time of the next transition from an exponential with rate \|Aii\|, and then a new state j = i with probability proportional to Aij.” It also describes its variational inference algorithm in structured steps: “1) Updating q(U\|T) = Q\|T \| t=1 q(ut): Given a discretization T, and an Ω, uniformization tells us that inference over U is just inference for a discrete-time hidden Markov model. ... 2) Updating q(T): We perform a greedy search over the space of time-discretizations by making local stochastic updates to the current T. Every iteration, we ﬁrst scan the current T to ﬁnd a beneﬁcial merge... If no merge is found, we then try to ﬁnd a beneﬁcial split.”
Open Source Code	No	No statement or link is provided indicating that the source code for the methodology is openly available.
Open Datasets	Yes	We use a dataset of check-in sequences from 8967 Four Square users in the year 2011, originally collected by Gao et al. (2012) for studying location-based social networks.
Dataset Splits	No	The paper states: “We randomly select 100 test sequences, and randomly hold out half of the observations in each test sequence. The training data consists of the observations that are not held out, i.e., 100 full sequences and 100 half sequences.” This describes train and test, but no separate validation set or split information.
Hardware Specification	No	No specific hardware details (like GPU/CPU models or cloud instance types) are provided.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers.
Experiment Setup	Yes	For VB on synthetic datasets we place a Gamma(20, 2) prior on Ω, and Dirichlet(2) priors on the transition probabilities and the observation probabilities, while on the check-in data, a Gamma(6, 1), a Dirichlet(0.1) and a Dirichlet(0.01) are placed. For MCMC on synthetic datasets, we place a Gamma(2, 0.2) and a Dirichlet(0.1) for the rate matrix, while on the check-in data, a Gamma(1, 1) and a Dirichlet(0.1) are placed. We run VB on the ﬁrst synthetic dataset for 200 iterations, after which we use the posterior expected counts of observations in each state to infer the output emission probabilities. We run the VB algorithm on the check-in data using 50 states for 200 iterations.