reproducibilityindex.ai

Confidence and Dispersity Speak: Characterizing Prediction Matrix for Unsupervised Accuracy Estimation

Authors: Weijian Deng, Yumin Suh, Stephen Gould, Liang Zheng

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments validate the effectiveness of nuclear norm for various models (e.g., Vi T and Conv Ne Xt), different datasets (e.g., Image Net and CUB-200), and diverse types of distribution shifts (e.g., style shift and reproduction shift).
Researcher Affiliation	Collaboration	1The Australian National University 2NEC Laboratories America, Inc. (NEC Labs).
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper references publicly available codebases used by the authors for models or dataset generation (e.g., 'We train models using the implementations from https://github.com/chenyaofo/pytorch-cifar-models.'), but does not state that the code for the methodology described in this paper is open-sourced or available.
Open Datasets	Yes	The datasets we use are standard benchmarks, which are publicly available. We have double-checked their license. We list their open-source as follows. CIFAR-10 (Krizhevsky et al., 2009) (https://www.cs.toronto.edu/ kriz/cifar.html); CIFAR-10-C (Hendrycks & Dietterich, 2019) (https://github.com/hendrycks/robustness); CIFAR-10.1 (Recht et al., 2018) (https://github.com/modestyachts/CIFAR-10.1); CINIC (Chrabaszcz et al., 2017) (https://github.com/Bayes Watch/cinic-10). Image Net-Validation (Deng et al., 2009) (https://www.image-net.org); Image Net-V2-A/B/C (Recht et al., 2019) (https://github.com/modestyachts/Image Net V2); Image Net-Corruption (Hendrycks & Dietterich, 2019) (https://github.com/hendrycks/robustness); Image Net-Sketch (Wang et al., 2019) (https://github.com/Haohan Wang/Image Net-Sketch); Image Net-Rendition (Hendrycks et al., 2021) (https://github.com/hendrycks/imagenet-r); Object Net (Barbu et al., 2019) (https://objectnet.dev). CUB-200-2011 (Wah et al., 2011) (https://www.vision.caltech.edu/datasets/cub 200 2011). CUB-Paintings (Wang et al., 2020) (https://github.com/thuml/PAN).
Dataset Splits	No	The paper mentions using 'Image Net training set' and 'Image Net validation set' for corruptions, and 'CIFAR-10 training set' for models, but does not provide explicit training, validation, or test dataset splits (percentages or absolute counts) for their models within these datasets. It refers to using existing datasets like Image Net-C derived from the validation set as their test sets.
Hardware Specification	Yes	We run all experiments on one 3090Ti with Py Torch (1.11.0+cu113). CPU is AMD Ryzen 9 5900X 12-Core Processor.
Software Dependencies	Yes	We run all experiments on one 3090Ti with Py Torch (1.11.0+cu113).
Experiment Setup	Yes	We empirically find that using a small temperature for softmax is helpful for all methods. Therefore, we use a temperature of 0.4 for all methods in the experiment.