Improving Calibration through the Relationship with Adversarial Robustness
Authors: Yao Qin, Xuezhi Wang, Alex Beutel, Ed Chi
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we study the connection between adversarial robustness and calibration and find that the inputs for which the model is sensitive to small perturbations (are easily attacked) are more likely to have poorly calibrated predictions. ... We perform experiments on the clean test set across three datasets: CIFAR-10 (Krizhevsky, 2009), CIFAR-100 (Krizhevsky, 2009) and Image Net (Russakovsky et al., 2015) with different networks, whose architecture and accuracy are shown in Table 1. |
| Researcher Affiliation | Industry | Yao Qin Xuezhi Wang Alex Beutel Ed H. Chi Google Research {yaoqin, xuezhiw, alexbeutel, edchi}@google.com |
| Pseudocode | Yes | Algorithm 1 Training procedure for AR-Ada LS |
| Open Source Code | No | The paper does not provide any statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | We perform experiments on the clean test set across three datasets: CIFAR-10 (Krizhevsky, 2009), CIFAR-100 (Krizhevsky, 2009) and Image Net (Russakovsky et al., 2015)... |
| Dataset Splits | Yes | To find the best hyperparameter ϵ for label smoothing, previous methods (Szegedy et al., 2016; Thulasidasan et al., 2019) sweep ϵ in a range and choose the one that has the best validation performance. ... Specifically, we first rank the adversarial robustness of the validation data and split the validation set into R equally-sized subsets. |
| Hardware Specification | No | The paper mentions 'computational intensity' for Image Net experiments but does not provide any specific details about the hardware used, such as GPU models, CPU types, or cloud computing instances. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | All the methods are trained with the same network architecture, i.e., WRN-28-10 (Zagoruyko & Komodakis, 2016) on both CIFAR-10 and CIFAR-100, and the same training hyperparameters: e.g., learning rate, batch size, number of training epochs, for fair comparison. ... Please refer to Appendix A for all the training details and hyperparameters. |