Improving Expert Predictions with Conformal Prediction
Authors: Eleni Straitouri, Lequn Wang, Nastaran Okati, Manuel Gomez Rodriguez
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Simulation experiments using synthetic and real expert predictions demonstrate that our system may help experts make more accurate predictions and is robust to the accuracy of the classifier the conformal predictor relies on. 5. Experiments on Synthetic Data 6. Experiments on Real Data |
| Researcher Affiliation | Academia | 1Max Planck Institute for Software Systems, Kaiserslautern, Germany 2Department of Computer Science, Cornell University, Ithaca, United States. |
| Pseudocode | Yes | Algorithm 1 Finding a near-optimal ˆα |
| Open Source Code | Yes | An open-source implementation of our system is available at https://github.com/Networks-Learning/improve-expertpredictions-conformal-prediction. |
| Open Datasets | Yes | We experiment with the dataset CIFAR10H (Peterson et al., 2019), which contains 10,000 natural images taken from the test set of the standard CIFAR10 (Krizhevsky et al., 2009). |
| Dataset Splits | Yes | For each prediction task, we generate 10,000 samples, pick 20% of these samples at random as test set, which we use to estimate the performance of our system, and also randomly split the remaining 80% into three disjoint subsets for training, calibration, and estimation, whose sizes we vary across experiments. |
| Hardware Specification | Yes | All algorithms ran on a Debian machine equipped with Intel Xeon E5-2667 v4 @ 3.2 GHz, 32GB memory and two M40 Nvidia Tesla GPU cards. |
| Software Dependencies | Yes | To implement our algorithms and run all the experiments on synthetic and real data, we used Py Torch 1.12.1, Num Py 1.20.1 and Scikit-learn 1.0.2 on Python 3.9.2. |
| Experiment Setup | Yes | We create a variety of synthetic prediction tasks, each with 20 features per sample and a varying number of label values n and difficulty. For each prediction task, we train a logistic regression model Pθ(Y | X)... and we use three popular and highly accurate deep neural network classifiers trained on CIFAR-10, namely Res Net110 (He et al., 2016a), Pre Res Net-110 (He et al., 2016b) and Dense Net (Huang et al., 2017). |