Improving Calibration through the Relationship with Adversarial Robustness

Authors: Yao Qin, Xuezhi Wang, Alex Beutel, Ed Chi

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we study the connection between adversarial robustness and calibration and find that the inputs for which the model is sensitive to small perturbations (are easily attacked) are more likely to have poorly calibrated predictions. ... We perform experiments on the clean test set across three datasets: CIFAR-10 (Krizhevsky, 2009), CIFAR-100 (Krizhevsky, 2009) and Image Net (Russakovsky et al., 2015) with different networks, whose architecture and accuracy are shown in Table 1.
Researcher Affiliation Industry Yao Qin Xuezhi Wang Alex Beutel Ed H. Chi Google Research {yaoqin, xuezhiw, alexbeutel, edchi}@google.com
Pseudocode Yes Algorithm 1 Training procedure for AR-Ada LS
Open Source Code No The paper does not provide any statement about releasing source code or a link to a code repository.
Open Datasets Yes We perform experiments on the clean test set across three datasets: CIFAR-10 (Krizhevsky, 2009), CIFAR-100 (Krizhevsky, 2009) and Image Net (Russakovsky et al., 2015)...
Dataset Splits Yes To find the best hyperparameter ϵ for label smoothing, previous methods (Szegedy et al., 2016; Thulasidasan et al., 2019) sweep ϵ in a range and choose the one that has the best validation performance. ... Specifically, we first rank the adversarial robustness of the validation data and split the validation set into R equally-sized subsets.
Hardware Specification No The paper mentions 'computational intensity' for Image Net experiments but does not provide any specific details about the hardware used, such as GPU models, CPU types, or cloud computing instances.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes All the methods are trained with the same network architecture, i.e., WRN-28-10 (Zagoruyko & Komodakis, 2016) on both CIFAR-10 and CIFAR-100, and the same training hyperparameters: e.g., learning rate, batch size, number of training epochs, for fair comparison. ... Please refer to Appendix A for all the training details and hyperparameters.