Particle Gibbs for Infinite Hidden Markov Models

Authors: Nilesh Tripuraneni, Shixiang (Shane) Gu, Hong Ge, Zoubin Ghahramani

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In the following experiments we explore the performance of the PG sampler on both the i HMM and the sticky i HMM. Note that throughout this section we have only taken N = 10 and N = 50 particles for the PG sampler which has time complexity O(TNK) when sampling from the posterior compared to the time complexity of O(TK2) of the beam sampler. For completeness, we also compare to the Gibbs sampler, which has been shown perform worse than the beam sampler [Van Gael et al., 2008], due to strong correlations in the latent states.
Researcher Affiliation Academia Nilesh Tripuraneni* University of Cambridge nt357@cam.ac.uk Shixiang Gu* University of Cambridge MPI for Intelligent Systems sg717@cam.ac.uk Hong Ge University of Cambridge hg344@cam.ac.uk Zoubin Ghahramani University of Cambridge zoubin@eng.cam.ac.uk
Pseudocode Yes Step 1: For iteration t = 1 initialize as: (a) sample si 1 q1( ), for i 1, ..., N. (b) initialize weights wi 1 = p(s1)f1(y1|s1) q1(s1) for i 1, ..., N. Step 2: For iteration t > 1 use trajectory s 1:T from t 1, β, π, φ, and K: (a) sample the index ai t 1 Cat( |W 1:N t 1 ) of the ancestor of particle i for i 1, ..., N 1. (b) sample si t qt( | s ai t 1 t 1 ) for i 1, ..., N 1. If si t = K + 1 then create a new state using the stick-breaking construction for the HDP: (i) Sample a new transition probability vector πK+1 Dir(αβ). (ii) Use stick-breaking construction to iteratively expand β [β, βK+1] as: β K+1 iid Beta(1, γ), βK+1 = β K+1ΠK ℓ=1(1 β ℓ). (iii) Expand transition probability vectors (πk), k = 1, . . . , K + 1, to include transitions to K + 1st state via the HDP stick-breaking construction as: πj [πj1, πj2, . . . , πj,K+1], j = 1, . . . , K + 1. π j K+1 Beta α0βK, α0(1 ℓ=1 βl) , πj K+1 = π j K+1ΠK ℓ=1(1 π jℓ). (iv) Sample a new emission parameter φK+1 H. (c) compute the ancestor weights wi t 1|T = wi t 1π(s t|si t 1) and resample a N t as P(a N t = i) wi t 1|T . (d) recompute and normalize particle weights using: wt(si t) = π(si t | s ai t 1 t 1 )f(yt | si t)/qt(si t | s ai t 1 t 1 ) Wt(si t) = wt(si t)/( i=1 wt(si t)) Step 3: Sample k with P(k = i) wi T and return s 1:T = sk 1:T .
Open Source Code No The paper does not provide any specific links or statements about the availability of open-source code for the described methodology.
Open Datasets Yes First as in Van Gael et al. [2008], we generate sequences of length 4000 from a 4 state HMM with self-transition probability of 0.75, and residual probability mass distributed uniformly over the remaining states where the emission distributions are taken to be normal with fixed standard deviation 0.5 and emission means of 2.0, 0.5, 1.0, 4.0 for the 4 states.
Dataset Splits No The paper describes training and testing on specific data lengths (e.g., 1000 training, 4000 testing characters for Alice in Wonderland), but it does not specify explicit train/validation/test splits, percentages, or cross-validation setup for general reproducibility.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments, only discussing the computational complexity of the algorithm.
Software Dependencies No The paper does not specify any software dependencies with version numbers, only referring to the use of 'Particle Gibbs sampler' and 'beam sampler'.
Experiment Setup Yes Note that throughout this section we have only taken N = 10 and N = 50 particles for the PG sampler which has time complexity O(TNK) when sampling from the posterior compared to the time complexity of O(TK2) of the beam sampler.