Optimal Aggregation of Prediction Intervals under Unsupervised Domain Shift

Authors: Jiawei Ge, Debarghya Mukherjee, Jianqing Fan

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Beyond illustrating the performance of our method through real-world datasets, we also delve into the theoretical details. and In Section 5, we illustrate the effectiveness of our method by applying it to five different datasets: i) airfoil dataset [DG19], ii) real estate data [Yeh18], iii) energy efficiency data [TX12b], iv) appliance energy prediction data [Can17], and v) ET Dataset (ETT-small) [ZZP+21].
Researcher Affiliation Academia Jiawei Ge Operations Research & Financial Engineering Princeton University jg5300@princeton.edu Debarghya Mukherjee Department of Mathematics and Statistics Boston University mdeb@bu.edu Jianqing Fan Operations Research & Financial Engineering Princeton University jqfan@princeton.edu
Pseudocode Yes Algorithm 1 Prediction intervals with bounded density ratio" and "Algorithm 2 Transport map
Open Source Code No The paper mentions providing open access to data and code in its NeurIPS checklist justification ('See Section 5'), but Section 5 itself describes the experimental setup and results without providing a direct link to a code repository or explicitly stating that code is available in supplementary materials.
Open Datasets Yes In this section, we illustrate the effectiveness of our method by applying it to five different datasets: i) airfoil dataset [DG19], ii) real estate data [Yeh18], iii) energy efficiency data [TX12b], iv) appliance energy prediction data [Can17], and v) ET Dataset (ETT-small) [ZZP+21]. The first four datasets are freely available in the UCI repository, and the last dataset can be found in this Git Hub link.
Dataset Splits Yes We run experiments 200 times; each time, we randomly partition the data into two parts Dtrain and Dtest, where Dtrain contains 75% of the data, and Dtest contains 25% of the data. and We further split D2 into two equal parts D2,1 and D2,2. D2,1, along with covariates from Dshift, is used to find the optimal aggregation of the six estimates to capture the shape, i.e., for obtaining ˆfinit. The second part D2,2 is used to shrink the interval to achieve 1 α = 0.95 coverage, i.e. to estimate ˆλ(α).
Hardware Specification No The main body of the paper, including Section 5 and Appendix C which describe the experiments, does not provide specific details on the hardware used (e.g., CPU/GPU models, memory). Such information is only mentioned in the NeurIPS checklist's justification for a 'No' answer regarding compute resources, stating 'Mac Book Pro laptop (with M2 Max CPU, 10 Cores, 32 GB RAM, and no GPU).'
Software Dependencies No The paper describes the use of various machine learning models (e.g., 'neural network', 'quantile regression forest', 'gradient boosting model', 'random forest') in Section 5, but does not specify the software libraries or their version numbers used for implementation (e.g., 'PyTorch 1.9', 'scikit-learn 1.0').
Experiment Setup Yes To construct a prediction interval, we consider the model aggregation approach, i.e., the function class F is defined as the linear combination of the following six estimates: (1) Estimator 1(f1): A neural network based estimator with depth=1, width=10... (2) Estimator 2(f2): A fully connected feed forward neural network with depth=2 and width=50... (3) Estimator 3(f3): A quantile regression forest estimating the 0.9 quantile function... (4) Estimator 4(f4): A gradient boosting model... (5) Estimator 5(f5): An estimate of E[(Y ˆm(X))2 | X = x] using random forest. (6) Estimator 6(f6): The constant function 1. and We divide the training data Dtrain into two halves D1 D2. We utilize dataset D1 to derive a mean estimator and six aforementioned estimates. We also employ the covariates from D1 and Dshift to compute a density ratio estimator. (2) We further split D2 into two equal parts D2,1 and D2,2. D2,1, along with covariates from Dshift, is used to find the optimal aggregation of the six estimates to capture the shape, i.e., for obtaining ˆfinit. The second part D2,2 is used to shrink the interval to achieve 1 α = 0.95 coverage, i.e. to estimate ˆλ(α).