reproducibilityindex.ai

Robust Calibration with Multi-domain Temperature Scaling

Authors: Yaodong Yu, Stephen Bates, Yi Ma, Michael Jordan

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through experiments on three benchmark data sets, we ﬁnd our proposed method outperforms existing methods as measured on both in-distribution and out-of-distribution test sets.
Researcher Affiliation	Academia	Yaodong Yu University of California, Berkeley Stephen Bates University of California, Berkeley Yi Ma University of California, Berkeley Michael I. Jordan University of California, Berkeley
Pseudocode	Yes	A presentation of the algorithm in pseudocode can be found in Algorithm 1, Appendix A.
Open Source Code	Yes	Our code is available at https://github. com/yaodongyu/MDTS.
Open Datasets	Yes	We evaluate different calibration methods on three datasets, Image Net-C [Hendrycks and Dietterich, 2019], WILDS-Rx Rx1 [Koh et al., 2021], and GLDv2 [Weyand et al., 2020].
Dataset Splits	Yes	For every domain k, we learn temperature ˆTk by applying temperature scaling on validation data Dk = {(xi,k, yi,k)}nk i=1 from k-th domain... For all datasets, we randomly sample half of the data from in-distribution domains for calibrating models and use the remaining samples for In D ECE evaluation.
Hardware Specification	Yes	All experiments are conducted on an NVIDIA A100 GPU.
Software Dependencies	No	The implementations are mainly based on scikit-learn [Pedregosa et al., 2011]. However, no specific version number for scikit-learn is provided.
Experiment Setup	Yes	We apply SGD optimizer to training the models on training datasets. We set the bin size as 100 for Image Net-C, and set bin size as 20 for WILDS-Rx Rx1 and GLDv2. We use grid search (on In D domains) to select hyperparameters for Ridge, Huber, KRR, and KNN.