reproducibilityindex.ai

Probabilistic Imputation for Time-series Classification with Missing Data

Authors: Seunghyun Kim, Hyunsu Kim, Eunggu Yun, Hwangrae Lee, Jaehun Lee, Juho Lee

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments on real-world time series data with missing values, we demonstrate the effectiveness of our method. In this section, we demonstrate our method on real-world multivariate time series data with missing values. We compare ours to the baselines on three datasets: MIMIC-III (Johnson et al., 2016), Physio Net 2012 (Silva et al., 2012), and Human Activity Recognition (Anguita et al., 2013).
Researcher Affiliation	Collaboration	1Kim Jaechul Graduate School of AI, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea 2Saige Research, Seoul, South Korea 3Samsung Research, Seoul, South Korea 4AITRICS, Seoul, South Korea.
Pseudocode	No	No pseudocode or algorithm blocks are present.
Open Source Code	No	Reproducibility statement Please refer to Appendix A for full experimental detail including datasets, models, and evaluation metrics. This statement does not mention source code availability.
Open Datasets	Yes	We use three irregularly sampled time series datasets to evaluate the classification and imputation performance of our model and baseline models. Physio Net 2012 (Silva et al., 2012), MIMIC-III (Johnson et al., 2016), and Human Activity Recognition (Anguita et al., 2013). For comparability, we employ python package medical ts datasets1 (Horn et al., 2020) which provides the unified data preprocessing pipeline for Physionet2012 and MIMIC-III datasets.
Dataset Splits	No	We employ early stopping for all classification experiments. We set early stopping patience to 20 epochs and set the valid Area Under ROC curve(AUROC) as the early stopping criterion. While early stopping implies a validation set, the paper does not explicitly provide the split ratio or methodology for creating the train/validation/test sets. It mentions using `medical ts datasets` but doesn't state what splits it uses, nor does it define custom splits.
Hardware Specification	No	No specific hardware details are mentioned in the paper.
Software Dependencies	No	For all the classification experiments, we fix a batch size of 128. We adopt Adam with weight decay as the optimizer and find the best weight decay for each model using grid search. We employ python package medical ts datasets1 (Horn et al., 2020). No specific version numbers for these or other software libraries are provided.
Experiment Setup	Yes	For all the classification experiments, we fix a batch size of 128. We adopt Adam with weight decay as the optimizer and find the best weight decay for each model using grid search. We employ early stopping for all classification experiments. We set early stopping patience to 20 epochs and set the valid Area Under ROC curve(AUROC) as the early stopping criterion. See Table 6 for hyperparameter settings of our model and baseline methods for all classification experiments. Table 6 details hyperparameters like weight decay: 0.0, n train latents: 10, n train samples: 1, n test latents: 20, n test samples: 30, n hidden: 128, z dim: 32, n units: 128, observe dropout: 0.4.