Probabilistic Imputation for Time-series Classification with Missing Data
Authors: Seunghyun Kim, Hyunsu Kim, Eunggu Yun, Hwangrae Lee, Jaehun Lee, Juho Lee
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments on real-world time series data with missing values, we demonstrate the effectiveness of our method. In this section, we demonstrate our method on real-world multivariate time series data with missing values. We compare ours to the baselines on three datasets: MIMIC-III (Johnson et al., 2016), Physio Net 2012 (Silva et al., 2012), and Human Activity Recognition (Anguita et al., 2013). |
| Researcher Affiliation | Collaboration | 1Kim Jaechul Graduate School of AI, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea 2Saige Research, Seoul, South Korea 3Samsung Research, Seoul, South Korea 4AITRICS, Seoul, South Korea. |
| Pseudocode | No | No pseudocode or algorithm blocks are present. |
| Open Source Code | No | Reproducibility statement Please refer to Appendix A for full experimental detail including datasets, models, and evaluation metrics. This statement does not mention source code availability. |
| Open Datasets | Yes | We use three irregularly sampled time series datasets to evaluate the classification and imputation performance of our model and baseline models. Physio Net 2012 (Silva et al., 2012), MIMIC-III (Johnson et al., 2016), and Human Activity Recognition (Anguita et al., 2013). For comparability, we employ python package medical ts datasets1 (Horn et al., 2020) which provides the unified data preprocessing pipeline for Physionet2012 and MIMIC-III datasets. |
| Dataset Splits | No | We employ early stopping for all classification experiments. We set early stopping patience to 20 epochs and set the valid Area Under ROC curve(AUROC) as the early stopping criterion. While early stopping implies a validation set, the paper does not explicitly provide the *split ratio* or *methodology* for creating the train/validation/test sets. It mentions using `medical ts datasets` but doesn't state what splits *it* uses, nor does it define custom splits. |
| Hardware Specification | No | No specific hardware details are mentioned in the paper. |
| Software Dependencies | No | For all the classification experiments, we fix a batch size of 128. We adopt Adam with weight decay as the optimizer and find the best weight decay for each model using grid search. We employ python package medical ts datasets1 (Horn et al., 2020). No specific version numbers for these or other software libraries are provided. |
| Experiment Setup | Yes | For all the classification experiments, we fix a batch size of 128. We adopt Adam with weight decay as the optimizer and find the best weight decay for each model using grid search. We employ early stopping for all classification experiments. We set early stopping patience to 20 epochs and set the valid Area Under ROC curve(AUROC) as the early stopping criterion. See Table 6 for hyperparameter settings of our model and baseline methods for all classification experiments. Table 6 details hyperparameters like weight decay: 0.0, n train latents: 10, n train samples: 1, n test latents: 20, n test samples: 30, n hidden: 128, z dim: 32, n units: 128, observe dropout: 0.4. |