Predictive inference is free with the jackknife+-after-bootstrap

Authors: Byol Kim, Chen Xu, Rina Barber

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our numerical experiments verify the coverage and accuracy of the resulting predictive intervals on real data.
Researcher Affiliation Academia Department of Statistics, The University of Chicago, Chicago, IL 60637, USA byolkim@uchicago.edu H. Milton Stewart School of Industrial & Systems Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA cxu310@gatech.edu Department of Statistics, The University of Chicago, Chicago, IL 60637, USA rina@uchicago.edu
Pseudocode Yes Algorithm 1 Ensemble learning Input: Data {(Xi, Yi)}n i=1 Output: Ensembled regression function bµϕ ... Algorithm 2 Jackknife+-after-bootstrap (J+a B) Input: Data {(Xi, Yi)}n i=1 Output: Predictive interval b CJ+a B α,n,B ... Algorithm 3 Lifted J+a B residuals Input: Data {(Xi, Yi)}n+1 i=1 Output: Residuals (Rij : i = j {1, .0. . , n + 1})
Open Source Code Yes The code is available online.6 ... 6https://www.stat.uchicago.edu/~rina/jackknife+-after-bootstrap_realdata.html
Open Datasets Yes We used three real data sets, which were also used in Barber et al. [2], following the same data preprocessing steps as described therein. The Communities and Crime (COMMUNITIES) data set [28] ... The Blog Feedback (BLOG) data set [9] ... The Medical Expenditure Panel Survey (MEPS) 2016 data set from the Agency for Healthcare Research and Quality, with details for older versions in [13]
Dataset Splits No We used n = 40 observations for training, sampling uniformly without replacement to create a training-test split for each trial.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, or cloud instance types) used for the experiments.
Software Dependencies No The paper mentions 'scikit-learn' but does not specify its version number or any other software dependencies with version numbers.
Experiment Setup Yes For RIDGE, we set the penalty at λ = 0.001 X 2, where X is the spectral norm of the training data matrix. RF was implemented using the Random Forest Regressor method from scikit-learn with 20 trees grown for each random forest using the mean absolute error criterion and the bootstrap option turned off, with default settings otherwise. For NN, we used the MLPRegressor method from scikit-learn with the L-BFGS solver and the logistic activation function, with default settings otherwise. For the aggregation ϕ, we used averaging (MEAN). We fixed α = 0.1 for the target coverage of 90%. We used n = 40 observations for training, sampling uniformly without replacement to create a training-test split for each trial. The results presented here are from 10 independent training-test splits of each data set. The ensemble wrappers J+a B and J+ENSEMBLE used sampling with replacement. We varied the size m of each bootstrap replicate as m/n = 0.2, 0.4, . . . , 1.0. For J+ENSEMBLE, we used B = 20. For the J+a B, we drew B Binomial( e B, (1 1 n+1)m) with e B = [20/{(1 1 n+1)m(1 1 n)m}], where [ ] refers to the integer part of the argument.