Efficient Learning of Continuous-Time Hidden Markov Models for Disease Progression

Authors: Yu-Ying Liu, Shuang Li, Fuxin Li, Le Song, James M. Rehg

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the use of CT-HMMs with more than 100 states to visualize and predict disease progression using a glaucoma dataset and an Alzheimer s disease dataset. 5 Experimental results We evaluated our EM algorithms in simulation (Sec. 5.1) and on two real-world datasets: a glaucoma dataset (Sec. 5.2) in which we compare our prediction performance to a state-of-the-art method, and a dataset for Alzheimer s disease (AD, Sec. 5.3) where we compare visualized progression trends to recent findings in the literature.
Researcher Affiliation Academia Yu-Ying Liu, Shuang Li, Fuxin Li, Le Song, and James M. Rehg College of Computing Georgia Institute of Technology Atlanta, GA
Pseudocode Yes Algorithm 1 CT-HMM Parameter learning (Soft/Hard) Algorithm 2 The Expm Algorithm for Computing End-State Conditioned Statistics
Open Source Code No The paper does not provide a direct link or explicit statement about the availability of its source code.
Open Datasets Yes We evaluated our EM algorithms in simulation (Sec. 5.1) and on two real-world datasets: a glaucoma dataset (Sec. 5.2) ... and an Alzheimer s disease dataset (AD, Sec. 5.3). The Alzheimers Disease Neuroimaging Initiative, http://adni.loni.usc.edu
Dataset Splits No The paper mentions synthetic data simulation and general use of 'training set' and 'testing patient' but does not provide specific percentages, sample counts, or citations for train/validation/test splits on the real-world datasets.
Hardware Specification No On the glaucoma dataset from Section 5.2, using a model with 105 states, Soft Expm requires 18 minutes per iteration on a 2.67 GHz machine with unoptimized MATLAB code. This describes a CPU frequency but does not specify a make, model, or other hardware components like GPU or RAM.
Software Dependencies No The paper mentions "unoptimized MATLAB code" but does not specify the version of MATLAB or any other software dependencies with version numbers.
Experiment Setup Yes We test the accuracy of all methods on a 5-state complete digraph with synthetic data generated under different noise levels. Each qi is randomly drawn from [1, 5] and then qij is drawn from [0, 1] and renormalized such that Pj=i qij = qi. The state chains are generated from Q, such that each chain has a total duration around T = 100 mini qi , where 1 mini qi is the largest mean holding time. The data emission model for state i is set as N(i, σ2), where σ varies under different noise level settings. The observations are then sampled from the state chains with rate 0.5 maxi qi , where 1 maxi qi is the smallest mean holding time, which should be dense enough to make the chain identifiable. A total of 105 observations are sampled. The convergence threshold is 10 8 on relative data likelihood change.