Evaluating Robustness to Dataset Shift via Parametric Robustness Sets

Authors: Nikolaj Thams, Michael Oberst, David Sontag

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply our approach to a computer vision task (classifying gender from images), revealing sensitivity to shifts in non-causal attributes. In a computer vision task, we find that this approach finds more impactful shifts than a reweighting approach, while taking far less time to compute, and that the resulting estimates of accuracy are substantially more reliable (see Section 4). We simulate K = 100 validation sets from P, in each estimating the worst-case shifts δTaylor (via the approach in Section 3.3) and δIS, where the latter corresponds to minimizing ˆEδ,IS using a standard non-convex solver from the scipy library [Virtanen et al., 2020]. We simulate ground truth data from PδIS and PδTaylor, to compare the two shifts.
Researcher Affiliation Academia Nikolaj Thams Dept. of Mathematical Sciences University of Copenhagen Copenhagen, Denmark thams@math.ku.dk Michael Oberst CSAIL & IMES MIT Cambridge, MA moberst@mit.edu David Sontag CSAIL & IMES MIT Cambridge, MA dsontag@csail.mit.edu
Pseudocode No The paper provides mathematical formulations and descriptions of the approach, but does not include any explicit pseudocode blocks or sections labeled 'Algorithm'.
Open Source Code Yes Code is available at this link.
Open Datasets Yes To illustrate this use-case, we make use of the Celeb A dataset [Liu et al., 2015], which contains images of faces and binary attributes (e.g., glasses, beard, etc.) encoding several features whose correlations may be unstable (e.g., the relation between gender and being bald).
Dataset Splits No We simulate K = 100 validation sets from P, in each estimating the worst-case shifts δTaylor (via the approach in Section 3.3) and δIS, where the latter corresponds to minimizing ˆEδ,IS using a standard non-convex solver from the scipy library [Virtanen et al., 2020].
Hardware Specification No The paper mentions finetuning a ResNet50 classifier but does not provide specific hardware details like GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies No The paper mentions using a 'standard non-convex solver from the scipy library' and 'finetuning a pretrained ResNet50 classifier', but it does not specify version numbers for these or any other software dependencies.
Experiment Setup No The paper states using a 'ResNet50 classifier' and '0/1 loss' and constraints like 'δ 2 λ = 2' but does not provide specific experimental setup details such as learning rates, batch sizes, optimizers, or training schedules.