Calibration by Distribution Matching: Trainable Kernel Calibration Metrics
Authors: Charlie Marx, Sofian Zalouk, Stefano Ermon
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical evaluation demonstrates that employing these metrics as regularizers enhances calibration, sharpness, and decision-making across a range of regression and classification tasks, outperforming methods relying solely on post-hoc recalibration. and 6 Experiments |
| Researcher Affiliation | Academia | Charles Marx Stanford University ctmarx@cs.stanford.edu Sofian Zalouk Stanford University szalouk@stanford.edu Stefano Ermon Stanford University ermon@cs.stanford.edu |
| Pseudocode | No | The paper does not contain any explicitly labeled pseudocode or algorithm blocks. Methodological steps are described in prose. |
| Open Source Code | Yes | Code to reproduce experiments can be found at https://github.com/kernel-calibration/kernel-calibration/. |
| Open Datasets | Yes | We use four tabular UCI datasets (SUPERCONDUCTIVITY [16], CRIME [34], BLOG [6], FB-COMMENT [41]), as well as the Medical Expenditure Panel Survey dataset (MEDICAL-EXPENDITURE [7]). and We use five tabular UCI datasets: BREAST-CANCER [49], HEART-DISEASE [18], ONLINE-SHOPPERS [40], DRY-BEAN [1], and ADULT [2]. |
| Dataset Splits | Yes | For each dataset, we randomly assign 70% of the dataset for training, 10% for validation, and 20% for testing. |
| Hardware Specification | Yes | All experiments were conducted on a single CPU machine (Intel(R) Xeon(R) Gold 6342 CPU @ 2.80GHz), utilizing 8 cores per experiment. and To accelerate training, we have also run some experiments using an 11GB NVIDIA Ge Force GTX 1080 Ti. |
| Software Dependencies | No | The paper mentions software like Python and PyTorch generally, but does not provide specific version numbers for these or other key software components used in the experiments. |
| Experiment Setup | Yes | For all experiments, we vary: Layer sizes between 32 and 512 RBF kernel bandwidths between 0.001 and 200 Batch sizes between 16 and 512, with and without batch normalization Learning rates between 10 7 and 10 1. The loss mixture weight λ (as in NLL +λ MMD and XE +λ MMD) between 0.1 and 1000. |