Leveraging Uncertainty Estimates To Improve Classifier Performance
Authors: Gundeep Arora, Srujana Merugu, Anoop Saladi, Rajeev Rastogi
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Evaluation of the proposed algorithms on three real-world datasets yield 25%-40% gain in recall at high precision bounds over the traditional approach of using model score alone, highlighting the benefits of leveraging uncertainty. |
| Researcher Affiliation | Industry | Gundeep Arora, Srujana Merugu, Anoop Saladi, Rajeev Rastogi Amazon gundeepa@amazon.com |
| Pseudocode | Yes | Algorithm 1 Optimal Equi-weight DP-based Multi-Thresholds [EW-DPMT] |
| Open Source Code | No | Moreover, we will publicly open-source our code later after we cleanup our code package and add proper documentation for it. |
| Open Datasets | Yes | Criteo: An online advertising dataset consisting of 45 MM ad impressions with click outcomes, each with 13 continuous and 26 categorical features. We use the split of 72% : 18% : 10% for train-validation-test from the benchmark, (ii) Avazu: Another CTR prediction dataset comprising 40 MM samples each with 22 features describing user and ad attributes. We use the train-validation-test splits of 70% : 10% : 20%, from the benchmark |
| Dataset Splits | Yes | For Criteo and Avazu, we use the SAM architecture (Cheng & Xue, 2021)... For Criteo... We use the split of 72% : 18% : 10% for train-validation-test from the benchmark... (ii) Avazu... We use the train-validation-test splits of 70% : 10% : 20%, from the benchmark... (iii) E-Com... We create train-validation-test sets in the proportion 50% : 12% : 38% from different time periods. |
| Hardware Specification | Yes | All models were trained on NVIDIA 16GB V100 GPU. We provide the pseudo code of binning and all algorithms implemented in Sec. 5 and Appendix G with details of bin-configuration in Sec 6.2. All binning and decision boundary related operations were performed on 4-core machine using Intel Xeon processor 2.3 GHz (Broadwell E5-2686 v4) running Linux. |
| Software Dependencies | No | In our implementation, we use the isotonic regression implementation is scikit-learn, which has linear time in terms of the input size for L2 loss (Stout, 2013). (Version number for scikit-learn is missing). |
| Experiment Setup | Yes | For Criteo and Avazu, we use the SAM architecture (Cheng & Xue, 2021) as the backbone with 1 fully-connected layer and 6 radial flow layers for class distribution estimation. For E-Com, we trained a FT-Transformer (Gorishniy et al., 2021) backbone with 8 radial flow layers. Binning strategies: We consider two options: (i) Equi-span where the uncertainty and score ranges are divided into equal sized K and L intervals, respectively... (ii) Equi-weight... Table 1 shows the recall at high precision bounds for various decision boundary algorithms on three large-scale datasets with 500 score and 3 uncertainty bins, averaged over 5 runs with different seeds. |