Efficient Learning of Continuous-Time Hidden Markov Models for Disease Progression
Authors: Yu-Ying Liu, Shuang Li, Fuxin Li, Le Song, James M. Rehg
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the use of CT-HMMs with more than 100 states to visualize and predict disease progression using a glaucoma dataset and an Alzheimer s disease dataset. 5 Experimental results We evaluated our EM algorithms in simulation (Sec. 5.1) and on two real-world datasets: a glaucoma dataset (Sec. 5.2) in which we compare our prediction performance to a state-of-the-art method, and a dataset for Alzheimer s disease (AD, Sec. 5.3) where we compare visualized progression trends to recent findings in the literature. |
| Researcher Affiliation | Academia | Yu-Ying Liu, Shuang Li, Fuxin Li, Le Song, and James M. Rehg College of Computing Georgia Institute of Technology Atlanta, GA |
| Pseudocode | Yes | Algorithm 1 CT-HMM Parameter learning (Soft/Hard) Algorithm 2 The Expm Algorithm for Computing End-State Conditioned Statistics |
| Open Source Code | No | The paper does not provide a direct link or explicit statement about the availability of its source code. |
| Open Datasets | Yes | We evaluated our EM algorithms in simulation (Sec. 5.1) and on two real-world datasets: a glaucoma dataset (Sec. 5.2) ... and an Alzheimer s disease dataset (AD, Sec. 5.3). The Alzheimers Disease Neuroimaging Initiative, http://adni.loni.usc.edu |
| Dataset Splits | No | The paper mentions synthetic data simulation and general use of 'training set' and 'testing patient' but does not provide specific percentages, sample counts, or citations for train/validation/test splits on the real-world datasets. |
| Hardware Specification | No | On the glaucoma dataset from Section 5.2, using a model with 105 states, Soft Expm requires 18 minutes per iteration on a 2.67 GHz machine with unoptimized MATLAB code. This describes a CPU frequency but does not specify a make, model, or other hardware components like GPU or RAM. |
| Software Dependencies | No | The paper mentions "unoptimized MATLAB code" but does not specify the version of MATLAB or any other software dependencies with version numbers. |
| Experiment Setup | Yes | We test the accuracy of all methods on a 5-state complete digraph with synthetic data generated under different noise levels. Each qi is randomly drawn from [1, 5] and then qij is drawn from [0, 1] and renormalized such that Pj=i qij = qi. The state chains are generated from Q, such that each chain has a total duration around T = 100 mini qi , where 1 mini qi is the largest mean holding time. The data emission model for state i is set as N(i, σ2), where σ varies under different noise level settings. The observations are then sampled from the state chains with rate 0.5 maxi qi , where 1 maxi qi is the smallest mean holding time, which should be dense enough to make the chain identifiable. A total of 105 observations are sampled. The convergence threshold is 10 8 on relative data likelihood change. |