On Calibration and Out-of-Domain Generalization
Authors: Yoav Wald, Amir Feder, Daniel Greenfeld, Uri Shalit
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Using four datasets from the recently proposed WILDS OOD benchmark [23], as well as the Colored MNIST dataset [21], we demonstrate that training or tuning models so they are calibrated across multiple domains leads to significantly improved performance on unseen test domains. |
| Researcher Affiliation | Collaboration | Yoav Wald Johns Hopkins University yoav.wald@gmail.com Technion amirfeder@gmail.com Daniel Greenfeld Jether Energy Research danielgreenfeld3@gmail.com Technion urishalit@technion.ac.il |
| Pseudocode | No | The paper describes methods and concepts but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | We are preparing the code for publication and will do our best to have it ready by the end of the review period. |
| Open Datasets | Yes | Using four datasets from the recently proposed WILDS OOD benchmark [23], as well as the Colored MNIST dataset [21], we demonstrate that training or tuning models so they are calibrated across multiple domains leads to significantly improved performance on unseen test domains. |
| Dataset Splits | Yes | In order to perform multi-domain calibration we modify the splits to include a multi-domain validation set whenever possible. See supplemental Section B for details and for additional results on Amazon Reviews. ... We specify hyperparameters and training details in the supplementary material (for both the WILDS benchmark and Colored MNIST). |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware used for experiments, such as CPU or GPU models, or cloud computing resources. |
| Software Dependencies | No | The paper mentions using PyTorch in the ethics checklist, but it does not specify version numbers for PyTorch or any other software dependencies needed to reproduce the experiments. |
| Experiment Setup | Yes | We specify hyperparameters and training details in the supplementary material (for both the WILDS benchmark and Colored MNIST). When using a training setup from other works (e.g. in Colored MNIST), we give a reference to the work and specify changes we made upon their setup. |