Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Conformal Prediction Intervals with Temporal Dependence

Authors: Zhen Lin, Shubhendu Trivedi, Jimeng Sun

TMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through a set of experiments, we will first verify the validity of both CPTD-M and CPTD-R, as well as the efficiency (average width of the PIs). Then, more importantly, we will verify our assumption that ignoring the temporal dependence will lead to some TS being consistently under/over-covered, and that CPTD-M and CPTD-R improve the longitudinal coverage by appropriately adjusting the nonconformity scores with additional information.
Researcher Affiliation Academia Zhen Lin University of Illinois at Urbana-Champaign Urbana, IL 61801 EMAIL Shubhendu Trivedi EMAIL Jimeng Sun University of Illinois at Urbana-Champaign Urbana, IL 61801 EMAIL During the initiation and pursuance of this research, the author s primary affiliation was MIT.
Pseudocode Yes Algorithm 1 Ratio-to-median-residual Normalization (CPTD-R) Input: {yi,s}i [N],s [t]: Response on the calibration set and the test TS up to t. {ˆyi,s}i [N+1],s [t+1]: Predictions on the calibration set and the test TS up to t + 1. Output: { ˆmi,t+1}: Normalization factors for the nonconformity scores at t + 1. Procedures: i [N + 1], s [t], compute ri,s |yi,s ˆyi,s|, and ms mediani{|ri,s|}. i [N + 1], estimate the overall rank ˆqi,t+1 using Eq. 17. Compute the empirical distribution of the median-normalized residuals {nri,t}N+1 i using Eq. 15. i [N + 1], look-up the normalizer ˆm R i,t+1 using Eq. 16.
Open Source Code Yes Our code is available at https://github.com/zlin7/CPTD.
Open Datasets Yes MIMIC: Electronic health records data for White Blood Cell Count (WBCC) prediction (Johnson et al. (2016); Goldberger et al. (2000); Johnson et al. (2019)). COVID19: COVID-19 case prediction in the United Kingdom (UK) (COVID). The cross-section is along different regions in UK. EEG: Electroencephalography trajectory prediction after visual stimuli (UCI EEG). Load: Utility (electricity) load forecasting (Hong et al. (2016)).
Dataset Splits Yes A summary of each dataset is in Table 2. Table 2: Size of each dataset, and the length of the time series. Properties MIMIC Insurance COVID19 EEG Load/Load-R # train/cal/test 192/100/100 2393/500/500 200/100/80 300/100/200 1198/200/700 Load, we perform a strict temporal splitting (test data is preceded by calibration data, which is preceded by the training data), which means the exchangeability is broken. We also include a Load-R (random) version that preserves the exchangeability by ignoring the temporal order in data splitting.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models or cloud instance types used for running experiments.
Software Dependencies No The paper mentions "ADAM" (Kingma & Ba (2015)) as an optimizer and "LSTM" (Hochreiter & Schmidhuber (1997)) as a base model, but does not provide specific version numbers for any software libraries, frameworks, or programming languages used.
Experiment Setup Yes We use LSTM Hochreiter & Schmidhuber (1997) as the base time series regression model (mean estimator) for all methods. We use ADAM (Kingma & Ba (2015)) as the optimizer with learning rate of 10 3, and MSE loss. The LSTM has one layer and a hidden size of 32, and is trained with 200, 1000, 100, 500 and 1000 epochs on MIMIC, COVID19, EEG, Insurance and Load, respectively. For QRNN, we replace the MSE loss with quantile loss.