Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Clustering Hidden Markov Models with Variational HEM

Authors: Emanuele Coviello, Antoni B. Chan, Gert R.G. Lanckriet

JMLR 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The beneﬁts of the proposed algorithm, which we call variational HEM (VHEM), are demonstrated on several tasks involving time-series data, such as hierarchical clustering of motion capture sequences, and automatic annotation and retrieval of music and of online hand-writing data, showing improvements over current methods. [...] In this section, we present an empirical study of the VHEM-H3M algorithm for clustering and hierarchical clustering of HMMs. [...] In this section, we present an empirical study of VHEM-H3M for density estimation, in automatic annotation and retrieval of music (Section 6.1) and hand-written digits (Section 6.2).
Researcher Affiliation	Academia	Emanuele Coviello EMAIL Department of Electrical and Computer Engineering University of California, San Diego La Jolla, CA 92093, USA Antoni B. Chan EMAIL Department of Computer Science City University of Hong Kong Kowloon Tong, Hong Kong Gert R.G. Lanckriet EMAIL Department of Electrical and Computer Engineering University of California, San Diego La Jolla, CA 92093, USA
Pseudocode	Yes	Algorithm 1 VHEM algorithm for H3Ms 1: Input: base H3M M(b) = {ω(b) i , M(b) i }K(b) i=1 , number of virtual samples N. 2: Initialize reduced H3M M(r) = {ω(r) j , M(a) j }K(r) j=1 . 4: {Variational E-step} 5: Compute optimal variational distributions and variational lower bounds: 6: for each pair of HMMs M(s) i and M(r) j 7: for each pair of emission GMMs for state β of M(s) i and ρ of M(r) j : 8: Compute optimal variational distributions ˆη(i,β),(j,ρ) ℓ\|m as in (18) 9: Compute optimal lower bound L(i,β),(j,ρ) GMM to expected log-likelihood as in (22) 10: Compute optimal variational distributions for HMMs as in Appendin B ˆφi,j 1 (ρ1\|β1), ˆφi,j t (ρt\|ρt 1, βt) for t = τ, . . . , 2 11: Compute optimal lower bound Li,j HMM to expected log-likelihood as in (21) 12: Compute optimal assignment probabilities: ˆzij = ω(r) j exp(Nω(b) i Li,j HMM) P j ω(r) j exp(Nω(b) i Li,j HMM) 13: Compute aggregate summary statistics for each pair of HMMs M(s) i and M(r) j as in Section 3.3.3: ˆνi,j 1 (ρ) = β=1 νi,j 1 (ρ, β), ˆνi,j(ρ, β) = t=1 νi,j t (ρ, β), ˆξi,j(ρ, ρ ) = β=1 ξi,j t (ρ, ρ , β) 14: {M-step} 15: For each component M(r) j , recompute parameters using (24)-(28). 16: until convergence 17: Output: reduced H3M {ω(r) j , M(a) j }K(r) j=1 .
Open Source Code	No	E.C. thanks Yingbo Song for providing the code for the PPK similarity between HMMs (Jebara et al., 2007) and assistance with the experiments on motion clustering. (This refers to code provided by a third-party for an external method, not the authors' own implementation of the VHEM algorithm.)
Open Datasets	Yes	We experiment on two motion capture data sets, the Mo Cap data set (http://mocap.cs.cmu.edu/) and the Vicon Physical Action data set (Theodoridis and Hu, 2007; Asuncion and Newman, 2010). [...] We consider the CAL500 collection from Turnbull et al. (2008), which consists of 502 songs and provides binary annotations with respect to a vocabulary V of 149 tags [...] We consider the Character Trajectories Data Set (Asuncion and Newman, 2010), which is a collection of 2858 examples of characters from the same writer
Dataset Splits	Yes	The data has been numerically diﬀerentiated and Gaussian smoothed (Williams et al., 2006). Half of the data is used for training, with the other half held out for testing. [...] All reported metrics are averages over the 97 tags that have at least 30 examples in CAL500 (11 genre, 14 instrument, 25 acoustic quality, 6 vocal characteristics, 35 emotion and 6 usage tags), and are the result of 5-fold cross-validation.
Hardware Specification	No	No specific hardware details (like CPU/GPU models, memory) are provided. The paper mentions running on a "cluster" (e.g., "due to the computational limits of our cluster") but no specifications.
Software Dependencies	No	No specific software versions (e.g., Python 3.8, PyTorch 1.9) are mentioned. The paper describes algorithms and models, but not the software environment used for implementation with version numbers.
Experiment Setup	Yes	In our experiments, we set the number of virtual samples to N = 10^4 K(ℓ 1), a large value that favors hard clustering (where each HMM is univocally assigned to a single cluster), and the length of the virtual sequences to τ = 10. [...] The ﬁrst level (Level 1) was formed by the K1 = 56 HMMs learned from each individual motion example (with S = 4 hidden states, and M = 2 components for each GMM emission). [...] Then, for each tag, the song-level HMMs labeled with that tag are pooled together to form a large H3M, and the VHEM-H3M algorithm is used to reduce this to a ﬁnal H3M tag-model with K = 3 components (τ = 10 and N = Nv Nt K(s), where Nv = 1000 and Nt is the number of training songs for the particular tag). [...] we estimate a series of class-conditional H3M distributions, one for each character, using hierarchical estimation with the VHEM-H3M algorithm. First, for each character, we partition all the relevant training data into groups of 3 sequences, and learn a HMM (with S = 4 states and M = 1 component for the GMM emissions) from each group using the Baum-Welch algorithm. Next, we estimate the class-conditional distribution (classiﬁcation model) for each character by aggregating all the relevant HMMs and summarizing them into a H3M with K = 4 components using the VHEM-H3M algorithm (τ = 10 and N = Nv Nc, where Nv = 10,21 and Nc is the number of intermediate models for that character). [...] In particular, the algorithms were terminated when the relative variation in the value of the objective function between two consecutive iterations was lower than a threshold εLL, which we varied in {10-2, 10-3, 10-4, 10-5}.