Robust Calibration with Multi-domain Temperature Scaling

Authors: Yaodong Yu, Stephen Bates, Yi Ma, Michael Jordan

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through experiments on three benchmark data sets, we find our proposed method outperforms existing methods as measured on both in-distribution and out-of-distribution test sets.
Researcher Affiliation Academia Yaodong Yu University of California, Berkeley Stephen Bates University of California, Berkeley Yi Ma University of California, Berkeley Michael I. Jordan University of California, Berkeley
Pseudocode Yes A presentation of the algorithm in pseudocode can be found in Algorithm 1, Appendix A.
Open Source Code Yes Our code is available at https://github. com/yaodongyu/MDTS.
Open Datasets Yes We evaluate different calibration methods on three datasets, Image Net-C [Hendrycks and Dietterich, 2019], WILDS-Rx Rx1 [Koh et al., 2021], and GLDv2 [Weyand et al., 2020].
Dataset Splits Yes For every domain k, we learn temperature ˆTk by applying temperature scaling on validation data Dk = {(xi,k, yi,k)}nk i=1 from k-th domain... For all datasets, we randomly sample half of the data from in-distribution domains for calibrating models and use the remaining samples for In D ECE evaluation.
Hardware Specification Yes All experiments are conducted on an NVIDIA A100 GPU.
Software Dependencies No The implementations are mainly based on scikit-learn [Pedregosa et al., 2011]. However, no specific version number for scikit-learn is provided.
Experiment Setup Yes We apply SGD optimizer to training the models on training datasets. We set the bin size as 100 for Image Net-C, and set bin size as 20 for WILDS-Rx Rx1 and GLDv2. We use grid search (on In D domains) to select hyperparameters for Ridge, Huber, KRR, and KNN.