Calibration by Distribution Matching: Trainable Kernel Calibration Metrics

Authors: Charlie Marx, Sofian Zalouk, Stefano Ermon

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical evaluation demonstrates that employing these metrics as regularizers enhances calibration, sharpness, and decision-making across a range of regression and classification tasks, outperforming methods relying solely on post-hoc recalibration. and 6 Experiments
Researcher Affiliation Academia Charles Marx Stanford University ctmarx@cs.stanford.edu Sofian Zalouk Stanford University szalouk@stanford.edu Stefano Ermon Stanford University ermon@cs.stanford.edu
Pseudocode No The paper does not contain any explicitly labeled pseudocode or algorithm blocks. Methodological steps are described in prose.
Open Source Code Yes Code to reproduce experiments can be found at https://github.com/kernel-calibration/kernel-calibration/.
Open Datasets Yes We use four tabular UCI datasets (SUPERCONDUCTIVITY [16], CRIME [34], BLOG [6], FB-COMMENT [41]), as well as the Medical Expenditure Panel Survey dataset (MEDICAL-EXPENDITURE [7]). and We use five tabular UCI datasets: BREAST-CANCER [49], HEART-DISEASE [18], ONLINE-SHOPPERS [40], DRY-BEAN [1], and ADULT [2].
Dataset Splits Yes For each dataset, we randomly assign 70% of the dataset for training, 10% for validation, and 20% for testing.
Hardware Specification Yes All experiments were conducted on a single CPU machine (Intel(R) Xeon(R) Gold 6342 CPU @ 2.80GHz), utilizing 8 cores per experiment. and To accelerate training, we have also run some experiments using an 11GB NVIDIA Ge Force GTX 1080 Ti.
Software Dependencies No The paper mentions software like Python and PyTorch generally, but does not provide specific version numbers for these or other key software components used in the experiments.
Experiment Setup Yes For all experiments, we vary: Layer sizes between 32 and 512 RBF kernel bandwidths between 0.001 and 200 Batch sizes between 16 and 512, with and without batch normalization Learning rates between 10 7 and 10 1. The loss mixture weight λ (as in NLL +λ MMD and XE +λ MMD) between 0.1 and 1000.