Clustering Interval-Censored Time-Series for Disease Phenotyping
Authors: Irene Y. Chen, Rahul G. Krishnan, David Sontag6211-6221
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On synthetic data, we demonstrate accurate, stable, and interpretable results that outperform several benchmarks. On real-world clinical datasets of heart failure and Parkinson s disease patients, we study how interval censoring can adversely affect the task of disease phenotyping. Our model corrects for this source of error and recovers known clinical subtypes. |
| Researcher Affiliation | Academia | Irene Y. Chen1, Rahul G. Krishnan2, David Sontag1 1MIT CSAIL and IMES 2University of Toronto iychen@csail.mit.edu, rahulgk@cs.toronto.edu, dsontag@csail.mit.edu |
| Pseudocode | Yes | Figure 1(c) describes the graphical model, and Algorithm 1 depicts the pseudocode for this procedure. |
| Open Source Code | No | The paper provides links to open-source implementations of *baseline* methods (e.g., Su Sta In, PAGA, DTW, SPARTan) but does not provide concrete access or state the availability of open-source code for its own proposed method, Sub Lign. |
| Open Datasets | Yes | Parkinson s disease (PD): We use publicly-available data from the Parkinson s Progression Markers Initiative (PPMI), an observational clinical study, totalling Nt = 423 PD patients and Nc = 196 healthy controls where N = Nt + Nc. |
| Dataset Splits | Yes | We evaluate models on 5 trials, each with a different randomized data split and random seed. For each trial, we learn on a training set (60%), find the best performance across all hyperparameters on the validation set (20%), and report the performance metrics on the held-out test set (20%). |
| Hardware Specification | Yes | Our models are implemented in Python 3.7 using Py Torch (Paszke et al. 2019) and are learned via Adam (Kingma and Ba 2014) on a single NVIDIA k80 GPU for 1000 epochs. |
| Software Dependencies | No | The paper mentions 'Python 3.7' and 'Py Torch' but does not provide specific version numbers for PyTorch or other key libraries/solvers needed for reproduction. While Python 3.7 is versioned, PyTorch is not, which prevents full reproducibility of the software environment. |
| Experiment Setup | Yes | We find optimal hyperparameters via grid search. For both synthetic and clinical experiments, we search over hyperparameters including dimensions of the latent space z (2, 5, 10), the number of hidden units in the RNN (50, 100, 200), the number of hidden units in the multi-layer perceptron (50, 100, 200), the learning rate (0.001, 0.01, 0.1, 1.), regularization parameter (0., 0.1, 1.), and regularization type (L1, L2). We set alignment extrema δ+ = 10 based on the maximum of the synthetic dataset and the maxima of the HF and PD datasets. We search over 50 time steps with ϵ = 0.1. For all models, we run for 1000 epochs... |