Robust Calibration with Multi-domain Temperature Scaling
Authors: Yaodong Yu, Stephen Bates, Yi Ma, Michael Jordan
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through experiments on three benchmark data sets, we find our proposed method outperforms existing methods as measured on both in-distribution and out-of-distribution test sets. |
| Researcher Affiliation | Academia | Yaodong Yu University of California, Berkeley Stephen Bates University of California, Berkeley Yi Ma University of California, Berkeley Michael I. Jordan University of California, Berkeley |
| Pseudocode | Yes | A presentation of the algorithm in pseudocode can be found in Algorithm 1, Appendix A. |
| Open Source Code | Yes | Our code is available at https://github. com/yaodongyu/MDTS. |
| Open Datasets | Yes | We evaluate different calibration methods on three datasets, Image Net-C [Hendrycks and Dietterich, 2019], WILDS-Rx Rx1 [Koh et al., 2021], and GLDv2 [Weyand et al., 2020]. |
| Dataset Splits | Yes | For every domain k, we learn temperature ˆTk by applying temperature scaling on validation data Dk = {(xi,k, yi,k)}nk i=1 from k-th domain... For all datasets, we randomly sample half of the data from in-distribution domains for calibrating models and use the remaining samples for In D ECE evaluation. |
| Hardware Specification | Yes | All experiments are conducted on an NVIDIA A100 GPU. |
| Software Dependencies | No | The implementations are mainly based on scikit-learn [Pedregosa et al., 2011]. However, no specific version number for scikit-learn is provided. |
| Experiment Setup | Yes | We apply SGD optimizer to training the models on training datasets. We set the bin size as 100 for Image Net-C, and set bin size as 20 for WILDS-Rx Rx1 and GLDv2. We use grid search (on In D domains) to select hyperparameters for Ridge, Huber, KRR, and KNN. |