reproducibilityindex.ai

On Calibration and Out-of-Domain Generalization

Authors: Yoav Wald, Amir Feder, Daniel Greenfeld, Uri Shalit

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Using four datasets from the recently proposed WILDS OOD benchmark [23], as well as the Colored MNIST dataset [21], we demonstrate that training or tuning models so they are calibrated across multiple domains leads to signiﬁcantly improved performance on unseen test domains.
Researcher Affiliation	Collaboration	Yoav Wald Johns Hopkins University yoav.wald@gmail.com Technion amirfeder@gmail.com Daniel Greenfeld Jether Energy Research danielgreenfeld3@gmail.com Technion urishalit@technion.ac.il
Pseudocode	No	The paper describes methods and concepts but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	We are preparing the code for publication and will do our best to have it ready by the end of the review period.
Open Datasets	Yes	Using four datasets from the recently proposed WILDS OOD benchmark [23], as well as the Colored MNIST dataset [21], we demonstrate that training or tuning models so they are calibrated across multiple domains leads to signiﬁcantly improved performance on unseen test domains.
Dataset Splits	Yes	In order to perform multi-domain calibration we modify the splits to include a multi-domain validation set whenever possible. See supplemental Section B for details and for additional results on Amazon Reviews. ... We specify hyperparameters and training details in the supplementary material (for both the WILDS benchmark and Colored MNIST).
Hardware Specification	No	The paper does not provide specific details regarding the hardware used for experiments, such as CPU or GPU models, or cloud computing resources.
Software Dependencies	No	The paper mentions using PyTorch in the ethics checklist, but it does not specify version numbers for PyTorch or any other software dependencies needed to reproduce the experiments.
Experiment Setup	Yes	We specify hyperparameters and training details in the supplementary material (for both the WILDS benchmark and Colored MNIST). When using a training setup from other works (e.g. in Colored MNIST), we give a reference to the work and specify changes we made upon their setup.