On Calibration and Out-of-Domain Generalization

Authors: Yoav Wald, Amir Feder, Daniel Greenfeld, Uri Shalit

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using four datasets from the recently proposed WILDS OOD benchmark [23], as well as the Colored MNIST dataset [21], we demonstrate that training or tuning models so they are calibrated across multiple domains leads to significantly improved performance on unseen test domains.
Researcher Affiliation Collaboration Yoav Wald Johns Hopkins University yoav.wald@gmail.com Technion amirfeder@gmail.com Daniel Greenfeld Jether Energy Research danielgreenfeld3@gmail.com Technion urishalit@technion.ac.il
Pseudocode No The paper describes methods and concepts but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No We are preparing the code for publication and will do our best to have it ready by the end of the review period.
Open Datasets Yes Using four datasets from the recently proposed WILDS OOD benchmark [23], as well as the Colored MNIST dataset [21], we demonstrate that training or tuning models so they are calibrated across multiple domains leads to significantly improved performance on unseen test domains.
Dataset Splits Yes In order to perform multi-domain calibration we modify the splits to include a multi-domain validation set whenever possible. See supplemental Section B for details and for additional results on Amazon Reviews. ... We specify hyperparameters and training details in the supplementary material (for both the WILDS benchmark and Colored MNIST).
Hardware Specification No The paper does not provide specific details regarding the hardware used for experiments, such as CPU or GPU models, or cloud computing resources.
Software Dependencies No The paper mentions using PyTorch in the ethics checklist, but it does not specify version numbers for PyTorch or any other software dependencies needed to reproduce the experiments.
Experiment Setup Yes We specify hyperparameters and training details in the supplementary material (for both the WILDS benchmark and Colored MNIST). When using a training setup from other works (e.g. in Colored MNIST), we give a reference to the work and specify changes we made upon their setup.