Learning model uncertainty as variance-minimizing instance weights

Authors: Nishant Jain, Karthikeyan Shanmugam, Pradeep Shenoy

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show in controlled experiments that we effectively capture diverse specific notions of uncertainty through this meta-objective, while previous approaches only capture certain aspects. These results translate to significant gains in real-world settings selective classification, label noise, domain adaptation, calibration and across datasets Imagenet, Cifar100, diabetic retinopathy, Camelyon, WILDs, Imagenet-C,-A,-R, Clothing1M, etc.
Researcher Affiliation Industry Nishant Jain, Karthikeyan Shanmugam & Pradeep Shenoy Google Research India {nishantjn, karthikeyanvs, shenoypradeep}@google.com
Pseudocode Yes Algorithm 1 REVAR training procedure.
Open Source Code No The paper does not contain an explicit statement or link indicating that the source code for the methodology is openly available.
Open Datasets Yes We used the Diabetic Retinopathy (DR) detection dataset (kag, 2015), a significant real-world benchmark for selective classification, alongside the APTOS DR test dataset (Society, 2019) for covariate shift analysis. We also used CIFAR-100, Image Net-100, and Image Net-1K datasets.
Dataset Splits Yes For our re-weighting scheme, we separate 10 percent of the data as the validation set.
Hardware Specification No The paper does not specify any particular GPU models, CPU models, or other hardware specifications used for running experiments.
Software Dependencies No The paper does not specify version numbers for any software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes We used a learning rate of 0.003 and a batch size of 64 to train each model in each of the experiments. For our re-weighting scheme, we separate 10 percent of the data as the validation set. ... A batch size of 128 is used and an initial learning rate of 1e 2 with a momentum of 0.9 is used. For U-SCORE we have used a learning rate of 1e 4 with a momentum of 0.9 and batch size same as classifier for all the experiments. ... A weight decay of 10 4 is used for both the networks. For all experiments, training is done for 300 epochs.