Training Uncertainty-Aware Classifiers with Conformalized Deep Learning

Authors: Bat-Sheva Einbinder, Yaniv Romano, Matteo Sesia, Yanfei Zhou

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments with synthetic and real data demonstrate this method can lead to smaller conformal prediction sets with higher conditional coverage, after exact calibration with hold-out data, compared to state-of-the-art alternatives.
Researcher Affiliation Academia Bat-Sheva Einbinder Faculty of Electrical & Computer Engineering (ECE) Technion, Israel bat-shevab@campus.technion.ac.il Yaniv Romano Faculty of ECE and of Computer Science Technion, Israel yromano@technion.ac.il Matteo Sesia Department of Data Sciences and Operations University of Southern California Los Angeles, California, USA sesia@marshall.usc.edu Yanfei Zhou Department of Data Sciences and Operations University of Southern California Los Angeles, California, USA yanfei.zhou@marshall.usc.edu
Pseudocode Yes Algorithm 1: Conformalized uncertainty-aware training of deep multi-class classifiers
Open Source Code Yes A more technically detailed version of Algorithm 1 is provided in Appendix A1.2, and an open-source software implementation of this method is available online at https://github.com/bat-sheva/conformal-learning.
Open Datasets Yes Convolutional neural networks guided by the conformal loss are trained on the publicly available CIFAR-10 image classification data set [81] (10 classes)...
Dataset Splits Yes For this purpose, we generate an additional validation set of 2000 independent data points and use it to preview the out-of-sample accuracy and loss value at each epoch.
Hardware Specification Yes For example, training a conformal loss model on 45000 images in the CIFAR-10 data set took us approximately 20 hours on an Nvidia P100 GPU
Software Dependencies No The paper mentions PyTorch [79] but does not specify a version number for it or any other software dependency.
Experiment Setup Yes Input: Data {Xi, Yi}n i=1; hyper-parameter λ [0, 1], learning rate γ > 0, batch size M; Randomly initialize the model parameters θ(0); Randomly split the data into two disjoint subsets, I1, I2, such that I1 I2 = [n]; Set the number of batches to B = (n/2)/M (assuming for simplicity that |I1| = |I2|); for t = 1, . . . , T do