reproducibilityindex.ai

(Almost) Provable Error Bounds Under Distribution Shift via Disagreement Discrepancy

Authors: Elan Rosenfeld, Saurabh Garg

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments across numerous vision datasets (BREEDs [65], FMo W-WILDs [35], Visda [51], Domainnet [53], CIFAR10, CIFAR100 [36] and Office Home [69]) demonstrate the effectiveness of our bound.
Researcher Affiliation	Academia	Elan Rosenfeld Machine Learning Department Carnegie Mellon University elan@cmu.edu Saurabh Garg Machine Learning Department Carnegie Mellon University
Pseudocode	No	The paper does not contain any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	We include the model with our publicly available repository.
Open Datasets	Yes	We conduct experiments across 11 vision benchmark datasets for distribution shift on datasets that span applications in object classification, satellite imagery, and medicine. We use four BREEDs datasets: [65] Entity13, Entity30, Nonliving26, and Living17; FMo W [11] and Camelyon [4] from WILDS [35]; Officehome [69]; Visda [52, 51]; CIFAR10, CIFAR100 [36]; and Domainet [53].
Dataset Splits	Yes	We use source hold-out performance to pick the best hyperparameters for the UDA methods, since we lack labeled validation data from the target distribution. For all methods, we implement post-hoc calibration on validation source data with temperature scaling [25], which has been shown to improve performance. We use the original train as source and OOD val and OOD test splits as target domains as they are collected over different time-period. Overall, we obtain 3 different domains.
Hardware Specification	Yes	Our experiments were performed across a combination of Nvidia T4, A6000, and V100 GPUs.
Software Dependencies	No	The paper mentions using 'the standard pytorch implementation [19]' but does not provide specific version numbers for PyTorch or any other software dependencies. It also mentions 'Transfer Learning Library [31]' but without a version.
Experiment Setup	Yes	First, we tune learning rate and ℓ2 regularization parameter by fixing batch size for each dataset that correspond to maximum we can fit to 15GB GPU memory. We set the number of epochs for training as per the suggestions of the authors of respective benchmarks. We summarize learning rate, batch size, number of epochs, and ℓ2 regularization parameter used in our study in Table A.3.