Multi-Source Conformal Inference Under Distribution Shift
Authors: Yi Liu, Alexander Levis, Sharon-Lise Normand, Larry Han
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we evaluate our proposed method by conducting extensive Monte Carlo simulations, examining aspects such as marginal coverage, conditional coverage, and the width of the prediction interval. [...] Hospital length of stay prediction intervals for pediatric patients undergoing a high-risk cardiac surgical procedure between 2016-2022 in the U.S. illustrate the utility of our methodology. |
| Researcher Affiliation | Academia | 1North Carolina State University, Department of Statistics, Raleigh, NC, USA 2Carnegie Mellon University, Department of Statistics, Pittsburgh, PA, USA 3Harvard Medical School, Department of Health Care Policy, Boston, MA, USA 4Northeastern University, Department of Health Sciences, Boston, MA, USA. |
| Pseudocode | Yes | Algorithm 1 Robust multi-source conformal prediction |
| Open Source Code | Yes | We provide a user-friendly R function Mu SCI() implementing the proposed method with an illustrative example, available at: https://github.com/yiliu1998/Multi-Source-Conformal. |
| Open Datasets | Yes | We utilize data from the Society of Thoracic Surgeons Congenital Heart Surgery Database (STS-CHSD) which includes audited preoperative, intraoperative, and early postoperative information (Overman et al., 2019) from U.S. congenital heart surgery centers. |
| Dataset Splits | Yes | Split the training data D randomly into D1 and D2, where Dj = {Oi D, i Ij} for j = 1, 2 and I1 I2 = {1, 2, . . . , n}. [...] We perform cross-fitting such that the nuisance estimators ( bm, bη, bq0) are estimated on an independent data split from the given estimating equation. [...] λ is a tuning parameter chosen by cross-validation. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions software like Super Learner, random forest, elastic net, GLM, and R, but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | In total, we consider 3 sample sizes (300, 1000, 3000) 3 levels of covariate shift (homogeneous, weakly heterogeneous, strongly heterogeneous) 2 types of outcome errors (homoskedastic, heteroskedastic) 3 levels of concept shift (CCOD holds, weak violation, strong violation) 3 different conformal scores (ASR, locally weighted ASR, CQR) = 162 scenarios for our proposed method and the five competitor methods. [...] We generate data from K = 5 sites, where site 0 is the target site and sites 1 through 4 are source sites, and Ti {0, , 4} denotes the site of subject i. Our goal is to construct valid prediction intervals for a testing point from the target site. We consider the sample size in each site to be nk {300, 1000, 3000}, k = 0, ..., 4 and generate data over M = 500 independent Monte Carlo replications. [...] λ is a tuning parameter chosen by cross-validation. |