Confidence and Dispersity Speak: Characterizing Prediction Matrix for Unsupervised Accuracy Estimation
Authors: Weijian Deng, Yumin Suh, Stephen Gould, Liang Zheng
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments validate the effectiveness of nuclear norm for various models (e.g., Vi T and Conv Ne Xt), different datasets (e.g., Image Net and CUB-200), and diverse types of distribution shifts (e.g., style shift and reproduction shift). |
| Researcher Affiliation | Collaboration | 1The Australian National University 2NEC Laboratories America, Inc. (NEC Labs). |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper references publicly available codebases used by the authors for models or dataset generation (e.g., 'We train models using the implementations from https://github.com/chenyaofo/pytorch-cifar-models.'), but does not state that the code for the methodology described in *this* paper is open-sourced or available. |
| Open Datasets | Yes | The datasets we use are standard benchmarks, which are publicly available. We have double-checked their license. We list their open-source as follows. CIFAR-10 (Krizhevsky et al., 2009) (https://www.cs.toronto.edu/ kriz/cifar.html); CIFAR-10-C (Hendrycks & Dietterich, 2019) (https://github.com/hendrycks/robustness); CIFAR-10.1 (Recht et al., 2018) (https://github.com/modestyachts/CIFAR-10.1); CINIC (Chrabaszcz et al., 2017) (https://github.com/Bayes Watch/cinic-10). Image Net-Validation (Deng et al., 2009) (https://www.image-net.org); Image Net-V2-A/B/C (Recht et al., 2019) (https://github.com/modestyachts/Image Net V2); Image Net-Corruption (Hendrycks & Dietterich, 2019) (https://github.com/hendrycks/robustness); Image Net-Sketch (Wang et al., 2019) (https://github.com/Haohan Wang/Image Net-Sketch); Image Net-Rendition (Hendrycks et al., 2021) (https://github.com/hendrycks/imagenet-r); Object Net (Barbu et al., 2019) (https://objectnet.dev). CUB-200-2011 (Wah et al., 2011) (https://www.vision.caltech.edu/datasets/cub 200 2011). CUB-Paintings (Wang et al., 2020) (https://github.com/thuml/PAN). |
| Dataset Splits | No | The paper mentions using 'Image Net training set' and 'Image Net validation set' for corruptions, and 'CIFAR-10 training set' for models, but does not provide explicit training, validation, or test dataset splits (percentages or absolute counts) for their models within these datasets. It refers to using existing datasets like Image Net-C derived from the validation set as their test sets. |
| Hardware Specification | Yes | We run all experiments on one 3090Ti with Py Torch (1.11.0+cu113). CPU is AMD Ryzen 9 5900X 12-Core Processor. |
| Software Dependencies | Yes | We run all experiments on one 3090Ti with Py Torch (1.11.0+cu113). |
| Experiment Setup | Yes | We empirically find that using a small temperature for softmax is helpful for all methods. Therefore, we use a temperature of 0.4 for all methods in the experiment. |